Open question: authored artifacts vs activity streams

add-polyfill-connector-systemProject noteOpen questions
Created openspec/changes/add-polyfill-connector-system/design-notes/authored-artifacts-vs-activity-open-question.mdView on GitHub →

Status: open Raised: 2026-04-19 Trigger: Layer 2 audits across all 7 connectors reviewed to date (Gmail, YNAB, USAA, GitHub in layer-2-coverage-gmail-ynab-usaa-github.md; ChatGPT, Claude Code, Codex in layer-2-coverage-chatgpt-claude-codex.md) found the same top-ranked gap in every one: we capture activity streams (what the user did) but omit authored artifacts (what the user built to shape their tools).

The pattern

Activity streamsAuthored artifacts
VolumeHigh (thousands → millions)Low (tens → hundreds)
MutationAppend-only (sometimes tombstones)Mutable; edited over time
CursorTimestamp / monotonicRevision or content-hash
Consent weightSensitive by volume (bulk access)Sensitive by leverage (encodes user strategy)
Disclosure framing"Your Gmail messages""Your custom ChatGPT prompts"
RestorationServer retains source of truth; re-collectableLost forever if not preserved

Concrete instances flagged as P0 gaps in the Layer 2 audits:

  • ChatGPTcustom_gpts (/gizmos/mine), custom_instructions (/user_system_messages). Current manifest has conversations + memories but no authored-GPT surface.
  • Claude Codeskills/*/SKILL.md, commands/*.md, projects/<p>/memory/*.md. Current manifest captures session jsonl transcripts only.
  • Codexprompts/*.md, skills/*/SKILL.md, rules/default.rules. Current manifest captures rollout jsonl only.
  • Gmailusers.settings.filters, vacation responder, forwarding rules. Current manifest is message envelopes.
  • USAA — user-configured alert preferences, bill-pay payee list. Current manifest is posted activity.
  • GitHub — authored gists, notification subscription config. Current manifest is user / repositories / starred.
  • YNAB — categories + payee renaming rules are authored-ish (line blurs with state).

Why this is a spec-level question

  • Stream semantics today capture data shape, not authorship origin. spec-core.md §6 defines exactly two values on streams[].semantics: append_only and mutable_state. Authored artifacts and activity streams can both be mutable_state (settings, custom GPTs) or both be append_only (messages, gist revisions). The axis the audits surfaced is orthogonal to the one the spec models.
  • Consent presentation has no vocabulary for it. Manifests currently rely on display.detail prose to convey "this includes your authored work" vs "this includes your activity." There is no structured field a client UI could key off.
  • Restoration guarantees differ fundamentally. Activity can be re-collected from the source; authored artifacts are gone if the user loses the export. The spec does not currently distinguish.
  • Disclosure weight differs. The ethical frame for losing an email archive is not the same as losing a user's system prompts; the spec gives both the same surface.

Candidate resolutions

A. New manifest field origin: "activity" | "authored" | "computed"

Streams declare origin explicitly alongside semantics. Pro: cheap, additive, unblocks consent copy and future preservation policy. Con: another axis for authors to reason about; boundary cases (Gmail filters applied to an authored rule list) need guidance.

B. A new top-level concept: "Artifact Streams"

Parallel to Collection Profile streams, separately spec'd, with stronger preservation and disclosure guarantees. Pro: sharpest contract. Con: doubles the data-model surface; most runtimes would have to handle both identically anyway.

C. Leave origin implicit, rely on stream descriptions

Manifests describe their streams naturally; clients and UIs infer class from language. Pro: no spec change. Con: the Layer 2 audits just demonstrated that without a structured cue, every author forgets the authored class.

D. Conventional naming (e.g., <source>_settings, <source>_customizations)

Pattern without spec enforcement. Pro: zero ceremony. Con: no lever for the consent card or retention policy to act on.

Cross-cutting observations

  • Connects to layer-2-completeness-open-question.md — if a manifest omits authored artifacts, is it "complete"? Completeness and origin are separable but co-arrive.
  • Connects to credential-storage-open-question.md — authored artifacts and credentials are the two classes of payload a user can't re-collect from the source.
  • Connects to rs-storage-topology-open-question.md — authored artifacts may warrant different retention, backup, and partitioning than high-volume activity streams.

Action items

  • Decide on an origin-axis approach before adding authored-artifact streams to existing connectors.
  • If resolution A wins, audit existing manifests and add origin to every declared stream.
  • Update consent-card copy guidance to distinguish these two classes regardless of which resolution wins.