Gmail connector — design notes
Status: design captured 2026-04-19 overnight. Source: Gmail IMAP audit subagent 2026-04-19.
Auth
- IMAP + Google app-specific password for v1. OAuth deferred.
- Credentials:
GOOGLE_APP_PASSWORD_PDPPenv var. Email address prompted at first run and stored (see runtime decision below). - Connection:
imap.gmail.com:993over TLS.
Library: imapflow (Node). Handles CONDSTORE natively, tolerates Gmail's lack of QRESYNC, clean async/await API, maintained.
Core architectural decision: iterate only "All Mail"
Gmail represents labels as IMAP folders. A message with labels [INBOX, Work] appears in three folders: [Gmail]/All Mail, INBOX, and Work. If the connector iterates per folder, it multi-counts. Always iterate the \All special-use folder (typically [Gmail]/All Mail) only. Derive label membership from X-GM-LABELS fetched per message.
Streams
messages (mutable_state, primary_key ["id"], consent_time_field "received_at")
id= X-GM-MSGID (64-bit integer as string, stable for message lifetime)thread_id= X-GM-THRIDsubjectfrom_name,from_emailto(array of{name, email})cc,bcc(array, nullable)date(ISO 8601 — header Date; may be sender-forged)received_at(ISO 8601 — IMAP INTERNALDATE, server receipt time, authoritative)message_id(RFC 2822 Message-ID header)in_reply_to(nullable)references(array of Message-IDs, nullable)size_bytes(RFC822.SIZE)labels(array of strings — current X-GM-LABELS including system labels mapped to canonical names)is_draft,is_flagged,is_seen,is_answered(booleans from IMAP flags)snippet(first ~256 chars of text/plain body; fetched on initial import only)has_attachments(boolean)
threads (mutable_state, primary_key ["id"], consent_time_field "first_message_date")
id= X-GM-THRIDsubject(of first message)participant_emails(array)message_countfirst_message_datelast_message_datelabels(union across thread)unread_count,flagged_count
Derived from messages. Emitted as its own stream for query convenience.
labels (mutable_state, primary_key ["name"])
name(raw IMAP label e.g.INBOX,[Gmail]/Sent Mail,Projects/Q2)canonical_name(normalized:inbox,sent,projects/q2)is_system(boolean;[Gmail]/*orINBOX)parent_name(nullable)message_count,unread_count
attachments (append_only, primary_key ["id"], consent_time_field "message_received_at")
id={X-GM-MSGID}:{part_index}(compound; part_index is the BODYSTRUCTURE path)message_id= X-GM-MSGIDfilenamecontent_typesize_bytes(encoded size if base64 — document this)content_id(for inline attachments; nullable)is_inline(boolean)encoding(base64,quoted-printable, etc.)part_index(string — BODYSTRUCTURE path, for later hydration)message_received_at(INTERNALDATE of parent — for time_range filtering)
v1 does NOT fetch attachment bytes. Separate blob_ref hydration step deferred to v2.
Incremental sync
- Cursor shape:
{ uidnext: N, modseq: N, all_mail_folder: "[Gmail]/All Mail" } - First run:
UIDVALIDITYcaptured,FETCH 1:* (FLAGS INTERNALDATE ENVELOPE RFC822.SIZE BODYSTRUCTURE X-GM-MSGID X-GM-THRID X-GM-LABELS MODSEQ)on[Gmail]/All Mail. - Subsequent runs:
SELECT "[Gmail]/All Mail"— get new UIDNEXT, HIGHESTMODSEQ.- If UIDVALIDITY changed, full refetch (rare).
FETCH UID:prev_uidnext..* (...)— new messages since last run.FETCH 1:* (FLAGS X-GM-LABELS MODSEQ) CHANGEDSINCE prev_modseq— label/flag changes on existing messages.- Detect expunged:
UID SEARCH UID <previously_seen_uids>→ UIDs missing = expunged. Emit tombstones.
Deletion semantics
\Deletedflag in a non-Trash folder = label removed, not deletion. Message may still exist in[Gmail]/All Mail.- Only messages missing from
[Gmail]/All Mailon subsequent sync → true deletion → tombstone.
Rate limits
- Google caps at 15 concurrent IMAP connections. We use 4 max.
- No published bandwidth cap but bulk ops can trigger 24-hour block. Keep runs small.
Humanlike-ness
- Pace bulk FETCH in windows of 200 messages; pause 500 ms between windows.
- Keep a single long-lived connection per run, not connection-per-message.
Autonomous decision (2026-04-19): email address prompt
On first run, connector emits INTERACTION kind=credentials with a simple form asking for the Gmail address. Response stored in connector state. Avoids hard-coding Tim's address in the manifest or env.
Alternative: derive from GOOGLE_APP_PASSWORD_PDPP name. Rejected — brittle, obscures intent.
Explicit non-goals v1
- Downloading full message bodies or attachment bytes.
- Chat/Meet content (not in IMAP surface).
- Real-time IDLE push. Deferred — scheduler polls every 30 min is sufficient.
- Multiple Gmail accounts — one at a time.
- Non-Gmail IMAP providers — the connector is Gmail-specific today; generic IMAP is a follow-up.