v0.1.0
Protocol Spec

Connector Ecosystem

Reference runtime notes for connectors — browser abstraction decisions and third-party source integration.

This is a reference implementation note, not normative protocol text. It records connector-runtime research and implementation direction for this repo. The Collection Profile defines the portable connector message contract; this page explains how the reference runtime can satisfy real connector needs.

Browser abstraction decision

Model A vs Model B

Two models for how connectors interact with browsers:

  • Model A (current/recommended): Runtime owns the browser and provides Playwright Page + ConnectorContext to connectors. Connectors use Playwright's full API. Simple, powerful, debuggable.
  • Model B (deferred): Runtime exposes browser via JSONL messages (BROWSER/BROWSER_RESULT). Connectors never touch Playwright. Enables process isolation and language independence but adds a fragile proxy layer.

Decision: Model A, with JSONL for everything else

Codex gpt-5.4 recommendation (2026-03-30): Do not build a custom BROWSER JSONL protocol. Connectors need real browser power (Cloudflare challenges, SPA navigation, network interception, cookie extraction). A message protocol either reimplements Playwright or falls back to evaluate for everything hard.

The protocol is JSONL for RECORD/STATE/INTERACTION/DONE. Browser automation is a runtime capability, not a protocol concern. When process isolation or language independence is needed, expose a CDP WebSocket URL rather than inventing a custom browser protocol.

Phased approach (from model-b-runtime-provided-browser.md):

  1. Phase 1 (now): Formalize BrowserCapability interface — refactor, not behavior change
  2. Phase 2 (when needed): Message protocol OR CDP endpoint for out-of-process connectors
  3. Phase 3 (defer): Full process isolation with container support

Connector strategies

How connectors get data from sources:

StrategyExamplesRuntime needsLanguage
API clientPlaid, Terra API, Spotify API, GitHub APIHTTP onlyAny
Browser automationInstagram, ChatGPT, LinkedIn, H-E-BPlaywright/CDP + browserJS/TS (current), any via CDP
Session cookie extractionslackdump, DiscordChatExporterCookies from browser profile, no live browserAny
Archive/export parserTimelinize, WhatsApp export, Facebook DYI, Google TakeoutFile system accessAny
Browser extensionLinkedIn scrapers, Amazon purchase historyRuns in user's browser, sends to local connectorJS (extension) + any (receiver)
Aggregator wrapperPlaid (12K+ banks), Terra (Fitbit/Oura/Garmin/Apple Health)Just API callsAny

Third-party tools that could become PDPP connectors

Go-based

ToolDataAuth methodLicenseWrap difficulty
slackdump (rusq/slackdump)Slack messages, threads, files, users, emojisBrowser session cookie (d cookie) or export tokenGPL-3.0Easy — already outputs JSON/SQLite
Timelinize (timelinize/timelinize)10+ sources: photos, Facebook, Instagram, Twitter, Google, iCloud, Strava, SMS, email, contactsPer-source (OAuth, file import, API keys)Apache-2.0Medium — need Go wrapper per data source

C# / .NET

ToolDataAuth methodLicenseWrap difficulty
DiscordChatExporter (Tyrrrz/DiscordChatExporter)Discord messages, DMs, servers, attachmentsUser tokenGPL-3.0Easy — supports JSON export, CLI invokable

Python

ToolDataAuth methodLicenseWrap difficulty
tg-archive (knadh/tg-archive)Telegram groups, private messages, mediaTelegram API credentials (api_id, api_hash, phone)MITEasy — syncs to SQLite, read and emit
rexport (karlicoss/rexport)Reddit comments, submissions, upvotes, savedReddit API (client_id, client_secret, username/password)MITEasy — outputs JSON arrays

Aggregator services (one connector = many sources)

ServiceData domainSources coveredAuthWrap difficulty
PlaidFinancial (transactions, accounts, balances, investments)12,000+ US/EU financial institutionsPlaid Link OAuth flow → access_tokenEasy — structured JSON API
Terra APIHealth/fitness (workouts, sleep, heart rate, steps)Fitbit, Oura, Garmin, Apple Health, Whoop, Peloton, etc.Terra OAuth → API callsEasy — structured JSON API
CommonHealthHealth records (Android)400+ data sourcesOn-device consentMedium — Android-specific

Archive parsers (user provides exported data)

SourceExport formatParser exists?Notes
WhatsApp.txt/.zip from phone exportPython parsers existE2E encrypted, no API access possible
Facebook DYI.zip archive (HTML or JSON)Timelinize parses itLarge archives, complex structure
Google Takeout.zip per-productTimelinize parses some51-54 data types
Apple Data & Privacy.zip archiveNo standard parser15 categories, 1-7 day fulfillment
Instagram data export.zip archiveTimelinize parses itDifferent format eras

Timelinize data sources (potential connectors)

Each Timelinize data source implements either FileImporter (parses archives) or APIImporter (calls APIs) or both:

  1. Photos/Videos (Apple HEIC/MOV, Google Photos, Samsung, generic EXIF)
  2. Facebook (DYI archive parser)
  3. Instagram (archive parser)
  4. Twitter/X (archive parser)
  5. Google Location History
  6. Apple iCloud
  7. Strava (API + GPS data)
  8. SMS/Text Messages (SMS Backup & Restore format)
  9. Email (Mbox/IMAP)
  10. Contacts (vCard, CSV)
  11. WhatsApp (archive parser)
  12. Telegram (archive parser)
  13. iMessage (local database)

Runtime requirements summary

The PDPP connector protocol (JSONL stdin/stdout) is universal. What varies is the runtime's optional capabilities:

CapabilityDeclared in manifestWho needs it
browser: "required"Manifest runtime_requirementsInstagram, ChatGPT, LinkedIn, H-E-B scrapers
browser: "optional"SameConnectors that prefer browser but can fall back to API
browser: "none"SamePlaid, Terra, GitHub API, slackdump, Timelinize, archive parsers
File system accessNot yet in manifest (future)Archive parsers, Timelinize file importers
Network accessImplicit (connectors handle their own HTTP)All API-based connectors

A runtime host either can or can't provide what the connector needs. If it can't and the connector requires it, the run fails with a clear error. The protocol is the same everywhere.

Implications for future specification work

  1. The JSONL protocol is correct. Every connector type (Go binary, Python script, Node.js + Playwright, aggregator wrapper) can write JSONL to stdout.
  2. Browser is a runtime capability, not a protocol concern. Connectors that need a browser get one from the runtime. The protocol doesn't define how.
  3. Aggregator connectors (Plaid, Terra) are high leverage. One Plaid connector = 12,000+ financial institutions. One Terra connector = dozens of health/fitness platforms.
  4. Archive parsers need file system access. The manifest may need a runtime_requirements.filesystem capability in the future.
  5. Go/Python/C# connectors work today via the JSONL protocol. No Node.js required. The runtime just spawns a process.

Sources