Connector Ecosystem
Reference runtime notes for connectors — browser abstraction decisions and third-party source integration.
This is a reference implementation note, not normative protocol text. It records connector-runtime research and implementation direction for this repo. The Collection Profile defines the portable connector message contract; this page explains how the reference runtime can satisfy real connector needs.
Browser abstraction decision
Model A vs Model B
Two models for how connectors interact with browsers:
- Model A (current/recommended): Runtime owns the browser and provides Playwright Page + ConnectorContext to connectors. Connectors use Playwright's full API. Simple, powerful, debuggable.
- Model B (deferred): Runtime exposes browser via JSONL messages (BROWSER/BROWSER_RESULT). Connectors never touch Playwright. Enables process isolation and language independence but adds a fragile proxy layer.
Decision: Model A, with JSONL for everything else
Codex gpt-5.4 recommendation (2026-03-30): Do not build a custom BROWSER JSONL protocol. Connectors need real browser power (Cloudflare challenges, SPA navigation, network interception, cookie extraction). A message protocol either reimplements Playwright or falls back to evaluate for everything hard.
The protocol is JSONL for RECORD/STATE/INTERACTION/DONE. Browser automation is a runtime capability, not a protocol concern. When process isolation or language independence is needed, expose a CDP WebSocket URL rather than inventing a custom browser protocol.
Phased approach (from model-b-runtime-provided-browser.md):
- Phase 1 (now): Formalize BrowserCapability interface — refactor, not behavior change
- Phase 2 (when needed): Message protocol OR CDP endpoint for out-of-process connectors
- Phase 3 (defer): Full process isolation with container support
Connector strategies
How connectors get data from sources:
| Strategy | Examples | Runtime needs | Language |
|---|---|---|---|
| API client | Plaid, Terra API, Spotify API, GitHub API | HTTP only | Any |
| Browser automation | Instagram, ChatGPT, LinkedIn, H-E-B | Playwright/CDP + browser | JS/TS (current), any via CDP |
| Session cookie extraction | slackdump, DiscordChatExporter | Cookies from browser profile, no live browser | Any |
| Archive/export parser | Timelinize, WhatsApp export, Facebook DYI, Google Takeout | File system access | Any |
| Browser extension | LinkedIn scrapers, Amazon purchase history | Runs in user's browser, sends to local connector | JS (extension) + any (receiver) |
| Aggregator wrapper | Plaid (12K+ banks), Terra (Fitbit/Oura/Garmin/Apple Health) | Just API calls | Any |
Third-party tools that could become PDPP connectors
Go-based
| Tool | Data | Auth method | License | Wrap difficulty |
|---|---|---|---|---|
| slackdump (rusq/slackdump) | Slack messages, threads, files, users, emojis | Browser session cookie (d cookie) or export token | GPL-3.0 | Easy — already outputs JSON/SQLite |
| Timelinize (timelinize/timelinize) | 10+ sources: photos, Facebook, Instagram, Twitter, Google, iCloud, Strava, SMS, email, contacts | Per-source (OAuth, file import, API keys) | Apache-2.0 | Medium — need Go wrapper per data source |
C# / .NET
| Tool | Data | Auth method | License | Wrap difficulty |
|---|---|---|---|---|
| DiscordChatExporter (Tyrrrz/DiscordChatExporter) | Discord messages, DMs, servers, attachments | User token | GPL-3.0 | Easy — supports JSON export, CLI invokable |
Python
| Tool | Data | Auth method | License | Wrap difficulty |
|---|---|---|---|---|
| tg-archive (knadh/tg-archive) | Telegram groups, private messages, media | Telegram API credentials (api_id, api_hash, phone) | MIT | Easy — syncs to SQLite, read and emit |
| rexport (karlicoss/rexport) | Reddit comments, submissions, upvotes, saved | Reddit API (client_id, client_secret, username/password) | MIT | Easy — outputs JSON arrays |
Aggregator services (one connector = many sources)
| Service | Data domain | Sources covered | Auth | Wrap difficulty |
|---|---|---|---|---|
| Plaid | Financial (transactions, accounts, balances, investments) | 12,000+ US/EU financial institutions | Plaid Link OAuth flow → access_token | Easy — structured JSON API |
| Terra API | Health/fitness (workouts, sleep, heart rate, steps) | Fitbit, Oura, Garmin, Apple Health, Whoop, Peloton, etc. | Terra OAuth → API calls | Easy — structured JSON API |
| CommonHealth | Health records (Android) | 400+ data sources | On-device consent | Medium — Android-specific |
Archive parsers (user provides exported data)
| Source | Export format | Parser exists? | Notes |
|---|---|---|---|
| .txt/.zip from phone export | Python parsers exist | E2E encrypted, no API access possible | |
| Facebook DYI | .zip archive (HTML or JSON) | Timelinize parses it | Large archives, complex structure |
| Google Takeout | .zip per-product | Timelinize parses some | 51-54 data types |
| Apple Data & Privacy | .zip archive | No standard parser | 15 categories, 1-7 day fulfillment |
| Instagram data export | .zip archive | Timelinize parses it | Different format eras |
Timelinize data sources (potential connectors)
Each Timelinize data source implements either FileImporter (parses archives) or APIImporter (calls APIs) or both:
- Photos/Videos (Apple HEIC/MOV, Google Photos, Samsung, generic EXIF)
- Facebook (DYI archive parser)
- Instagram (archive parser)
- Twitter/X (archive parser)
- Google Location History
- Apple iCloud
- Strava (API + GPS data)
- SMS/Text Messages (SMS Backup & Restore format)
- Email (Mbox/IMAP)
- Contacts (vCard, CSV)
- WhatsApp (archive parser)
- Telegram (archive parser)
- iMessage (local database)
Runtime requirements summary
The PDPP connector protocol (JSONL stdin/stdout) is universal. What varies is the runtime's optional capabilities:
| Capability | Declared in manifest | Who needs it |
|---|---|---|
browser: "required" | Manifest runtime_requirements | Instagram, ChatGPT, LinkedIn, H-E-B scrapers |
browser: "optional" | Same | Connectors that prefer browser but can fall back to API |
browser: "none" | Same | Plaid, Terra, GitHub API, slackdump, Timelinize, archive parsers |
| File system access | Not yet in manifest (future) | Archive parsers, Timelinize file importers |
| Network access | Implicit (connectors handle their own HTTP) | All API-based connectors |
A runtime host either can or can't provide what the connector needs. If it can't and the connector requires it, the run fails with a clear error. The protocol is the same everywhere.
Implications for future specification work
- The JSONL protocol is correct. Every connector type (Go binary, Python script, Node.js + Playwright, aggregator wrapper) can write JSONL to stdout.
- Browser is a runtime capability, not a protocol concern. Connectors that need a browser get one from the runtime. The protocol doesn't define how.
- Aggregator connectors (Plaid, Terra) are high leverage. One Plaid connector = 12,000+ financial institutions. One Terra connector = dozens of health/fitness platforms.
- Archive parsers need file system access. The manifest may need a
runtime_requirements.filesystemcapability in the future. - Go/Python/C# connectors work today via the JSONL protocol. No Node.js required. The runtime just spawns a process.
Sources
- Gemini 3.1 Pro Preview research with Google Search (2026-03-30)
- Codex gpt-5.4 analysis (2026-03-30): browser abstraction recommendation
- model-b-runtime-provided-browser.md: phased approach to browser abstraction
- slackdump: https://github.com/rusq/slackdump
- DiscordChatExporter: https://github.com/Tyrrrz/DiscordChatExporter
- tg-archive: https://github.com/knadh/tg-archive
- rexport: https://github.com/karlicoss/rexport
- Timelinize: https://github.com/timelinize/timelinize
- Plaid: https://plaid.com/docs/
- Terra API: https://docs.tryterra.co/