Add Reddit Pilot Real Shape Fixture
tasks0/20
1. Capture
- Run the v0.2.0 Reddit connector with
PDPP_CAPTURE_FIXTURES=1against the owner account; confirm all seven streams emit records. - Verify the raw run lands under
fixtures/reddit/raw/<runId>/records/*.jsonlandhttp/*.json.
2. Redaction plan
- For every raw
records/<stream>.jsonl, author a<path>.redactions.jsonplan per the structured-redaction contract (version, redactions[], each withtext,replacement,reason). - Replacements MUST use
[REDACTED_*]placeholders; no free-form substitutions. - Cover free-form
title,body,selftext,url(when personal), and identifying permalink slugs. Leave stable IDs (t3_*,t1_*) and non-identifying subreddit names alone. - Review the plan for false negatives — every span a human reader would consider identifying MUST have a redaction entry.
3. Scrub
- Run
pnpm exec tsx bin/scrub-fixtures.ts reddit <runId> --llm-redactions-dir ./local-redactions/reddit. - Confirm the scrubber exits 0 with every raw file accounted for (fail-closed mode catches missing plans).
- Rename the output directory to
fixtures/reddit/scrubbed/pilot-real-shape/(matching the Amazon/GitHub pilot convention).
4. Review
- Eyeball the scrubbed
records/*.jsonlfor residual PII. A reviewer other than the capture author SHOULD sign off. - Confirm every record still parses as JSON and preserves record key + schema-critical fields (
id,created_utc,kind,subredditwhere non-identifying). - If any residual PII is found, add a deterministic rule to
connectors/reddit/scrub-rules.tsor extend the redaction plan; do not hand-edit scrubbed output.
5. Tests
- Extend
connectors/reddit/integration.test.tswith apilot-real-shapeblock that reads every committedfixtures/reddit/scrubbed/pilot-real-shape/records/<stream>.jsonlline and assertsvalidateRecord(stream, row).ok === true. - Add a shape-drift guard: assert every emitted record in the pilot has
fetched_at,created_utc,id, and the stream-specific required fields.
6. Documentation
- Update
packages/polyfill-connectors/docs/connector-authoring-guide.md§9.1 to reference the Reddit pilot as the records-stream shape example (alongside Amazon=DOM, GitHub=API JSON). - Add a one-line note in the Reddit
index.tsCHANGES section that the pilot fixture exists and where it lives.
7. Validation
- Run
pnpm --dir packages/polyfill-connectors run verify. - Run
pnpm --dir packages/polyfill-connectors testand confirm the new pilot tests pass. - Run
openspec validate add-reddit-pilot-real-shape-fixture --strict. - Run
openspec validate --all --strict.