Layer 2 coverage audit: ChatGPT, Claude Code, Codex
Raised: 2026-04-19 Context: Layer 1 compares emitted fields against declared manifest fields. Layer 2 compares the manifest against the source's actual surface — what the platform exposes that a user might want and we omit. This audit covers the three AI-assistant connectors; none have been reviewed against the full upstream surface since v0.1.
ChatGPT
Current manifest scope
conversations, messages (flattened current-branch tree), memories. Profiles chats, full.
Source surface
Endpoints the browser UI hits on chatgpt.com/backend-api:
/conversations,/conversation/{id}(declared)/memories?include_memory_entries=true(declared)/gizmos/mine— user-authored Custom GPTs/gizmos/bootstrap/projects,/public-api/conversations— Projects (group of conversations + instructions + files)/shared_conversations— public share links the user created/user_system_messages— custom instructions ("about me" / "how to respond" / traits)/files/{id},/files/{id}/download— uploaded attachments (message.attachment_ids resolves here; never joined)/accounts/check,/me— account + workspace identity/settings/data_controls— training opt-out, retention Voice transcripts land in the normal conversation tree. Canvas docs appear ascontent_type:"canvas"message parts.
Gaps
- Stream
custom_gpts(P0): User's own GPTs are a primary creative artifact.GET /backend-api/gizmos/mine?limit=24&cursor=...→{items:[{id, short_url, display:{name, description, prompt_starters, welcome_message, profile_picture_url}, author, instructions, tools, files, updated_at}]}. Equivalent to Notion omitting pages the user wrote. - Stream
custom_instructions(P0):GET /backend-api/user_system_messages→{enabled, about_user_message, about_model_message, name_user_message, role_user_message, traits_model_message, disabled_tools}. Explicit in OpenAI's own takeout; comparable in weight tomemorieswhich we already have. - Stream
shared_conversations(P1):GET /backend-api/shared_conversations?order=created→{share_id, conversation_id, title, create_time, update_time, is_anonymous, highlighted_message_id}. High recall-value — users rarely remember what they've published. - Stream
projects+ fieldproject_idon conversations (P1): lift Projects out as their own stream;project_idis in the detail response today but dropped. - Stream
files(P2): resolveattachment_idsviaGET /backend-api/files/{id}→{file_name, mime_type, size, download_url, created_at}. Optional bytes fetch. - Field split on
messages(P2): forcontent_type=user_editable_context, split user_profile/user_instructions out of the sharedcontentcolumn (or lift intocustom_instructions). - Stream
account(P2): email, plan, workspaces from/me.
Deliberately omitted
- Bearer token + oai-device-id — auth.
- Voice audio blobs — no distinct endpoint; transcripts are in messages.
- Enterprise admin surfaces (audit logs, SCIM) — admin-scoped, not the individual user's.
Claude Code
Current manifest scope
sessions, messages, attachments, all from ~/.claude/projects/<encoded>/*.jsonl.
Source surface (~/.claude/ on this machine)
projects/<p>/*.jsonl— session transcripts (declared)projects/<p>/<session>/subagents/*.jsonl— full subagent transcripts (the main jsonl only has sidechain stubs)projects/<p>/<session>/tool-results/*.txt— full tool outputs (main jsonl stores ~500-char previews)projects/<p>/memory/*.md— assistant-curated per-project memory (MEMORY.md + feedback notes)skills/<name>/SKILL.md— user-installed skills (40+ here)commands/*.md— user-authored slash commandshooks/*.sh— user shell hooksplugins/installed_plugins.json+plugins/marketplaces/— plugin registrysettings.json,settings.local.json— permissions, enabled MCP servers, hooksfile-history/<uuid>/<hash>@v<n>— versioned snapshots of every file the agent editedtodos/*.json,tasks/<id>/,sessions/<id>/,paste-cache/*.txt,captured-insights.md,history.jsonl— various stateCLAUDE.md,CLAUDE.local.md,RTK.md— user-level system prompts- Each repo also carries
./CLAUDE.md+./.claude/settings.json+./.claude/commands/— project-scoped behavior contracts
Gaps
- Stream
skills(P0):~/.claude/skills/<name>/SKILL.md+ supporting files. Record:{id, name, scope:"user"|"project", description, triggers, body, path, installed_from?}. Same weight as ChatGPT custom_gpts — user-authored customization. - Stream
slash_commands(P0):~/.claude/commands/*.md(+ per-repo./.claude/commands/*.md). Frontmatter{description, argument-hint}+ prompt body. - Stream
memory_notes(P0):~/.claude/projects/<p>/memory/*.md. The direct analog to ChatGPTmemories— currently invisible.{id, project_path, filename, body, updated_at}. - Stream
subagent_transcripts(P1):projects/<p>/<session>/subagents/*.jsonl. Sidechain stubs in the main jsonl reference but don't contain these; for heavy sessions they hold most of the model's work. - Stream
tool_results_full(P1): resolve 500-charcontent_previewto the full.txtbody viatool_use_id. - Stream
file_snapshots(P1): metadata from~/.claude/file-history/<uuid>/<hash>@v<n>. Opt-in for bytes. - Stream
project_instructions(P1): per-repo./CLAUDE.md+./.claude/settings.json; scope derivable fromsession.cwd. - Stream
settingssingleton (P2): permissions, hooks, enabled MCP servers. - Stream
todos(P2):~/.claude/todos/*.json.
Deliberately omitted
.credentials.json— auth.statsig/,telemetry/,debug/,stats-cache.json,mcp-needs-auth-cache.json— product telemetry.shell-snapshots/,session-env/,plugins/cache/— restoration caches / checked-out plugin code.paste-cache/*.txt,history.jsonl— redundant with message content when actually used.file-history/bytes — size-prohibitive; metadata only unless user opts in.
Codex
Current manifest scope
sessions, messages, function_calls from ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl.
Source surface (~/.codex/)
sessions/YYYY/MM/DD/rollout-*.jsonl— rollouts (declared)state_5.sqlite— thread index DB.threadshas:id, rollout_path, created_at, updated_at, source, model_provider, cwd, title, sandbox_policy, approval_mode, tokens_used, has_user_event, archived, archived_at, git_sha, git_branch, git_origin_url, cli_version, first_user_message, agent_nickname, agent_role, memory_mode, model, reasoning_effort, agent_path. Alsothread_dynamic_tools,thread_spawn_edges,agent_jobs,agent_job_items.config.toml— model/effort/sandbox defaults + per-projecttrust_levelmapprompts/*.md— user-authored slash-command templatesrules/default.rules— accumulatedprefix_rule(pattern=[...], decision=...)trust decisionsskills/<name>/SKILL.md— user-installed skills (14 here)memories/— reserved for future memory feature (empty today)logs_2.sqlite,shell_snapshots/,cache/,tmp/,models_cache.json,auth.json,installation_id,version.json— telemetry/auth/ephemeral
Gaps
- Fields on
sessionsviastate_5.sqlite#threadsjoin (P0):title,archived,archived_at,tokens_used,first_user_message,sandbox_policy,approval_mode,agent_nickname,agent_role,memory_mode,git_origin_url. Users cannot reconcile their session list withouttitleandarchived. Either enrichsessionsor add athreadsstream — but the DB, not the rollout jsonl, is the truth for these fields. - Stream
prompts(P0):~/.codex/prompts/*.md. User-authored slash commands, same as Claude Code. - Stream
skills(P0):~/.codex/skills/<name>/SKILL.md. Same as Claude Code. - Stream
approval_rules(P1):~/.codex/rules/default.rules, accumulated prefix_rule decisions.{id, pattern, decision, updated_at}. - Stream
configsingleton (P1):config.toml— model default, effort, sandbox_mode, per-project trust_level map. - Stream
agent_jobs(P2): Codex background-agent state. Low priority for v0.1.
Deliberately omitted
auth.json,installation_id— auth/telemetry.logs_2.sqlite,shell_snapshots/,cache/,tmp/,.tmp/,models_cache.json— operational.- Encrypted reasoning traces — opaque by design.
AGENTS.mdwhen symlinked out to dotfiles (user manages externally).
Cross-cutting observations
-
User-authored customization is the systematic blind spot. All three connectors capture runtime transcripts but miss authored artifacts: Custom GPTs + custom instructions (ChatGPT), skills + commands + memory notes (Claude Code), prompts + skills + rules (Codex). These are the highest-leverage, lowest-volume pieces of user data and their absence would be the most embarrassing gap in a "true export" promise.
-
"Sessions" is two tables on disk for the local connectors. Codex
state_5.sqlite#threadsis the truth fortitle/archived/tokens_used; the rollout jsonl is subordinate. Claude Code hasprojects/<p>/memory/*.mdandprojects/<p>/<session>/subagents/*.jsonlthat single-file parsing never sees. Both manifests implicitly assume one-file-equals-one-session; the reality is hierarchical. -
Fan-out payloads are represented by IDs, not resolved. ChatGPT
attachment_ids, Codexoutput_preview, Claude Codecontent_previewall leave the real payload behind. Acceptable for v0.1 but the manifest should advertise it so Layer 1 doesn't treat previews as missing fields. -
Filesystem scope is unstated. Claude Code / Codex both honor env overrides (
CLAUDE_CODE_PROJECTS_DIR,CODEX_SESSIONS_DIR) but any new stream touching~/.claude/skills, per-repoCLAUDE.md, or~/.codex/rulesrequires a manifest-level declaration of filesystem footprint so reviewers can audit reach. -
Registry parity is the payoff. If the three manifests add customizations + authored memory, they converge on the same mental model — runtime transcripts + user-authored customization + user-curated memory — which is worth more than any single field.