polyfill-runtime Specification
Purpose
TBD - created by archiving change add-connector-refresh-policy-controls. Update Purpose after archive.
Requirements
Requirement: Polyfill manifests MAY declare refresh policy hints
First-party polyfill connector manifests MAY declare capabilities.refresh_policy as reference/runtime metadata describing recommended scheduling posture. These hints SHALL NOT be treated as finalized PDPP core protocol semantics in this tranche.
Scenario: Connector declares a refresh policy
- WHEN a polyfill manifest includes
capabilities.refresh_policy - THEN the policy SHALL identify a recommended mode and an owner-readable rationale
- AND it MAY include recommended interval, minimum interval, maximum staleness, interaction posture, session lifetime, rate-limit sensitivity, bot-detection sensitivity, background-safety hints, and an assisted-after-owner-auth hint
Scenario: Connector has high human-interaction friction
- WHEN a connector commonly requires OTP, credentials, or manual browser action
- THEN its refresh policy SHOULD recommend manual refresh or conservative automatic scheduling with assisted-after-owner-auth posture
- AND the rationale SHOULD explain the human-attention cost
Scenario: Connector has low interaction cost
- WHEN a connector can refresh safely with durable credentials, local files, or low-friction API access
- THEN its refresh policy MAY recommend automatic refresh with an appropriate interval
Scenario: A future spec wants portable scheduling semantics
- WHEN refresh policy hints need to become interoperable across implementations
- THEN the vocabulary SHALL be promoted through a separate Collection Profile or companion-spec change
- AND this reference/polyfill metadata SHALL NOT be retroactively treated as normative PDPP core protocol
Requirement: Browser-backed connectors SHALL acquire browsers exclusively through the isolated patchright launcher
The polyfill-connector runtime SHALL provide exactly one browser-launch primitive (acquireIsolatedBrowser) that launches a per-connector patchright Chromium with an isolated profile directory. Browser-backed connectors and operator tools SHALL NOT use a long-lived shared Chromium daemon, CDP-attach, or a shared profile directory across connectors.
Scenario: A browser-backed connector run launches a browser
- WHEN the runtime begins a browser-backed connector run
- THEN it SHALL call the isolated patchright launcher with a per-connector profile name
- AND the launcher SHALL create or reuse a profile directory under
~/.pdpp/profiles/<profile-name>/ - AND it SHALL NOT read or write
~/.pdpp/browser-daemon.json - AND it SHALL NOT call
chromium.connectOverCDP.
Scenario: An operator tool needs a browser
- WHEN an operator-side script under
bin/needs a Chromium context - THEN it SHALL acquire that context through the same isolated patchright launcher
- AND it SHALL NOT spawn or attach to a separate browser-daemon process.
Scenario: Two connectors run in parallel
- WHEN the runtime executes two different browser-backed connectors concurrently
- THEN each connector SHALL receive an independent profile directory
- AND neither connector SHALL share cookies, localStorage, or fingerprint state with the other.
Requirement: The runtime SHALL NOT expose browser-daemon lifecycle commands
The polyfill operator surface SHALL NOT advertise or implement commands to start, stop, restart, query, or tail logs for a long-lived shared browser process. Operator-facing browser commands SHALL NOT exist as a documented or functional CLI surface.
Scenario: An operator inspects the polyfill CLI surface
- WHEN an operator views
pdpp-connectors --helpor any equivalent help output - THEN there SHALL be no
browser start,browser stop,browser status,browser restart,browser logs,browser bootstrap, orbrowser probesubcommand.
Scenario: A doc references the legacy daemon
- WHEN any user-facing doc, runbook, or design note in the active set references the daemon CLI
- THEN the reference SHALL be removed or marked superseded
- AND the recommended path SHALL point to per-connector auto-login plus
INTERACTION kind=credentialsfor initial credentialing.
Requirement: Multi-account support SHALL be enabled by per-subject profile keys
The polyfill-runtime SHALL be extensible to support multiple owner accounts per platform without sharing browser profile state across accounts. The default profile-name derivation SHALL be replaceable with a per-subject derivation when multi-account support ships.
Scenario: Single-account default (current tranche)
- WHEN a browser-backed connector does not supply an explicit
profileName - THEN the runtime SHALL default to
profileName = <connector-name> - AND this is acknowledged as single-account by design.
Scenario: Multi-account derivation (future tranche)
- WHEN multi-account support is enabled in a later change
- THEN the default
profileNamederivation SHALL include a stable subject identifier - AND two accounts on the same platform SHALL receive independent profile directories
- AND they SHALL be safe to run concurrently without collision on Chromium's per-profile
SingletonLock.
Requirement: Runtime SHALL enforce the resources filter on every RECORD
The polyfill runtime SHALL reject any RECORD whose key is not in the grant's declared resources set for that stream, if the set is non-empty.
Scenario: Connector emits a record outside the declared resources set
- WHEN a connector emits a RECORD whose
keyis not present inSTART.scope.streams[].resources - THEN the runtime SHALL raise a protocol violation and terminate the run
- AND the error SHALL name the offending stream and key
Scenario: Empty resources set is a no-op
- WHEN
START.scope.streams[].resourcesis absent or empty - THEN the runtime SHALL NOT filter records by key for that stream
Requirement: Runtime SHALL expose a filesystem binding for local-file connectors
The polyfill runtime SHALL include a filesystem binding in buildAvailableBindings so connectors that parse local files (e.g. Claude Code sessions, Codex rollouts, iMessage sqlite, WhatsApp exports) satisfy their runtime_requirements.bindings.filesystem.required: true declaration.
Scenario: File-based connector starts successfully
- WHEN a manifest declares
runtime_requirements.bindings.filesystem.required: trueand the runtime spawns the connector - THEN the runtime SHALL treat
filesystemas available - AND the connector SHALL NOT fail with "Runtime cannot satisfy required binding: filesystem"
Requirement: Connectors SHALL emit tombstones for mutable_state streams that expose deletion
When a source platform exposes a "deleted" signal on a stream whose semantics is mutable_state, the connector SHALL emit a RECORD with op: "delete" for the tombstoned key.
Scenario: Mutable-state deletion
- WHEN the upstream reports that a record has been deleted (e.g. YNAB
deleted: true, Notion archived page, Pocketstatus: 2, Gmail EXPUNGE) - THEN the connector SHALL emit
{type: "RECORD", stream, key, op: "delete"} - AND the runtime SHALL persist the tombstone so downstream consumers can observe the deletion
Scenario: Append-only streams
- WHEN a stream's
semanticsisappend_only - THEN the connector SHALL NOT emit tombstones (there is no deletion on append-only data)
Requirement: Connectors SHALL request credentials via INTERACTION when missing
When a connector starts and required credentials are absent from its environment, the connector SHALL emit INTERACTION kind: "missing_credentials" rather than failing silently.
Scenario: Missing credentials with interactive binding
- WHEN a connector is spawned with
interactive: {}in its bindings and its required credentials env vars are unset - THEN the connector SHALL emit an INTERACTION with
kind: "missing_credentials"and a human-readablemessageexplaining which env vars are needed - AND the runtime SHALL park the run until the interaction is answered or the grant expires
Scenario: Missing credentials without interactive binding
- WHEN a connector is spawned without
interactive: {}and credentials are missing - THEN the connector SHALL emit DONE with status
failedand an error message naming the missing credentials - AND the run SHALL NOT hang waiting for an unavailable interaction channel
Requirement: Connectors SHALL drain stdout before exiting
Connectors SHALL call a flushAndExit(code) helper (or equivalent) that waits for the Node stdout drain event before invoking process.exit, with a bounded safety timeout.
Scenario: Final DONE message on a pipe
- WHEN a connector emits its terminal DONE and then exits
- THEN the stdout pipe to the runtime SHALL NOT be closed before the final newline-delimited message is flushed
- AND the runtime SHALL observe a well-formed DONE (no truncation, no "Unterminated string in JSON" parser error)
Scenario: Safety timeout
- WHEN the stdout drain never fires (e.g. consumer died)
- THEN the connector SHALL exit after a bounded timeout (≤ 3 seconds) rather than hanging indefinitely
Requirement: Connectors declaring manifest streams SHALL validate emitted records or be on a justified schemaless allowlist
A first-party polyfill connector whose manifest declares one or more streams SHALL wire emit-time record validation into its runtime entrypoint (runConnector({ ..., validateRecord }), conventionally built with makeValidateRecord over a schemas.ts registry), OR SHALL appear on an explicit schemaless allowlist with a per-connector justification.
This requirement is reference-implementation authoring policy and CI tooling. It
SHALL NOT be treated as PDPP Core protocol semantics or as a Collection Profile
runtime requirement: the runtime entrypoint's validateRecord parameter remains
optional so the framework can still execute a zero-dependency connector. The
requirement constrains how first-party connectors are authored and how the
reference build verifies them, not what a conformant resource server or
Collection Profile implementation must do.
A build-time check SHALL enforce this invariant in the path CI already runs, and SHALL fail with the offending connector name when the invariant is violated.
Scenario: A connector declares manifest streams and wires validation
- WHEN a connector's manifest declares one or more streams
- AND the connector wires
validateRecordinto itsrunConnectorentrypoint - THEN the build-time check SHALL pass for that connector
- AND the connector SHALL NOT appear on the schemaless allowlist.
Scenario: A new connector declares streams but omits validation
- WHEN a connector's manifest declares one or more streams
- AND the connector does not wire
validateRecord - AND the connector is not on the schemaless allowlist
- THEN the build-time check SHALL fail and name that connector
- AND the failure message SHALL direct the author to either wire validation or add a justified allowlist entry.
Scenario: An allowlisted connector adds validation later
- WHEN a connector that is on the schemaless allowlist begins wiring
validateRecord - THEN the build-time check SHALL fail until the connector's allowlist entry is removed
- AND the allowlist SHALL therefore only ever shrink as connectors adopt validation.
Scenario: A connector declares no streams
- WHEN a connector's manifest declares zero streams
- THEN the build-time check SHALL NOT require validation wiring for that connector
- AND the connector SHALL NOT be required to appear on the allowlist.
Scenario: The schemaless allowlist carries justifications
- WHEN a connector is on the schemaless allowlist
- THEN its entry SHALL carry an owner-readable justification identifying why validation is not yet wired and the remediation path
- AND the allowlist SHALL be the authoritative, machine-checked census of connectors that emit records without emit-time shape validation.
Requirement: Connector manifest stream schema SHALL declare and validate coverage_policy
The packages/reference-contract manifest stream schema SHALL include
coverage_policy as an optional field with a closed enum of accepted values:
collect, deferred, inventory_only, unavailable, and unsupported.
The field SHALL be optional; absence is treated as collect (the default, "this
stream is intended to be fully collected"). A connector author declaring a stream
as unsupported or unavailable SHALL also set required: false to avoid a
contradictory manifest signal (required: true + accepted-coverage policy
degrades health rather than projecting accepted-coverage-green).
Scenario: manifest schema accepts all valid coverage_policy values
WHEN a manifest stream declares coverage_policy with one of collect,
deferred, inventory_only, unavailable, or unsupported
THEN the reference-contract schema validation SHALL accept the manifest
without error.
Scenario: manifest schema rejects unknown coverage_policy values
WHEN a manifest stream declares a coverage_policy value outside the
recognized enum
THEN the reference-contract schema validation SHALL reject the manifest with
a type error.
Scenario: absence of coverage_policy is valid
WHEN a manifest stream does not declare coverage_policy
THEN the schema SHALL accept the manifest
AND the server SHALL treat the stream as collect (fully collected by
default).
Requirement: Connectors with a detail lane SHALL emit DETAIL_COVERAGE once per run
A connector that runs a list+detail lane SHALL emit exactly one DETAIL_COVERAGE
message per run, after the detail lane completes. A list+detail lane is one that
fetches a list of records and then fetches per-record detail for at least a
subset of those records. The message SHALL carry:
stream: the detail stream name.state_stream: the list/parent stream whose cursor anchors the detail pass.required_keys: the full set of record keys the connector considered for detail fetch in this run.hydrated_keys: the subset ofrequired_keysfor which detail was successfully fetched and emitted.gap_keys(optional): keys for which aDETAIL_GAPwas emitted.optional_skip_keys(optional): keys skipped by explicit policy (e.g. rate-limited voluntarily, filtered by selection scope).
Connectors that emit only flat streams with no per-record detail fetch are exempt from this requirement.
Scenario: list+detail run emits DETAIL_COVERAGE after the detail lane
WHEN a connector completes a list+detail run
THEN the connector SHALL emit a DETAIL_COVERAGE message
AND the message SHALL appear after the last RECORD or DETAIL_GAP emitted by
the detail lane in the same run
AND required_keys SHALL equal the set of keys the connector scanned for
detail
Scenario: fully hydrated run emits DETAIL_COVERAGE with no gap_keys
WHEN a list+detail run completes with no DETAIL_GAP messages
THEN DETAIL_COVERAGE.hydrated_keys SHALL equal DETAIL_COVERAGE.required_keys
AND gap_keys SHALL be absent or empty
Scenario: partially hydrated run carries gap_keys matching emitted DETAIL_GAPs
WHEN a list+detail run emits N DETAIL_GAP messages
THEN DETAIL_COVERAGE.gap_keys SHALL contain those N keys
AND hydrated_keys SHALL NOT contain keys that also appear in gap_keys
Requirement: Browser runtime SHALL bound manual-action page-metadata reads
When the browser handoff reads page metadata (e.g. page.title()) to attach to a manual-action interaction, the read SHALL be bounded by a local deadline so a wedged renderer cannot prevent the interaction from being emitted. The interaction SHALL still be registered and emitted with whatever metadata is available, and a metadata read that times out SHALL be surfaced as a compact diagnostic rather than swallowed.
Scenario: Page metadata read times out
- WHEN the browser handoff prepares a manual-action interaction
- AND the page-title read does not resolve within the bounded deadline
- THEN the runtime SHALL stop waiting on the title read at the deadline
- AND it SHALL still emit and register the interaction using the page URL and any metadata already available
- AND it SHALL write a compact diagnostic noting the metadata timeout
Scenario: Page metadata read succeeds quickly
- WHEN the browser handoff prepares a manual-action interaction
- AND the page-title read resolves within the bounded deadline
- THEN the runtime SHALL attach the resolved title to the interaction
- AND it SHALL NOT write a metadata-timeout diagnostic
Requirement: Browser runtime SHALL checkpoint session-establishment phases with durable diagnostics
The browser runtime SHALL expose a session-establishment checkpoint hook to the connector's ensureSession flow and SHALL itself record framing checkpoints around session establishment. Each checkpoint SHALL update the run's last-establishment-progress marker and, when fixture/trace capture is active, SHALL trigger a best-effort durable diagnostic capture labelled for that phase, so a hang during establishment does not leave only an initial blank-page artifact.
Scenario: Connector marks an auth phase
- WHEN a connector's
ensureSessioncalls the provided checkpoint hook with a phase label - THEN the runtime SHALL record that label and the time it was reached as the last establishment-progress marker
- AND when capture is active it SHALL attempt a durable diagnostic capture for that phase
- AND a failure of the diagnostic capture SHALL NOT fail the run
Scenario: Runtime frames the establishment window
- WHEN the runtime begins session establishment for a browser-backed run
- THEN it SHALL record at least one framing checkpoint before delegating to the connector's session flow
- AND the connector SHALL be able to add phase checkpoints specific to its own auth state machine
Requirement: Browser runtime SHALL bound session establishment with a fail-closed watchdog
The browser runtime SHALL bound the session-establishment phase with a watchdog keyed on checkpoint progress. If session establishment makes no checkpoint progress within a bounded, configurable deadline, the runtime SHALL finalize diagnostics, fail the run fail-closed with a terminal failure, and release the browser so the run cannot remain active indefinitely. The watchdog SHALL be paused while an interaction is open so a run legitimately waiting on the owner is not killed.
Scenario: Establishment stalls with no checkpoint progress
- WHEN session establishment makes no checkpoint progress for longer than the configured watchdog deadline
- AND no interaction is currently open
- THEN the runtime SHALL finalize trace and capture diagnostics for the in-flight run
- AND it SHALL emit a terminal
DONEwith statusfailedand a*_session_establish_timeouterror - AND it SHALL release the browser so the run is not left active indefinitely
Scenario: Establishment is making checkpoint progress
- WHEN session establishment reaches successive checkpoints with no gap exceeding the watchdog deadline
- THEN the runtime SHALL NOT trip the watchdog
- AND the run SHALL be allowed to proceed even if total establishment time exceeds the deadline
Scenario: Establishment is blocked on an open interaction
- WHEN session establishment is blocked waiting for an owner interaction (e.g. CAPTCHA or OTP) to resolve
- THEN the watchdog SHALL be paused for the duration of the open interaction
- AND it SHALL resume with a reset deadline once the interaction resolves
Scenario: Watchdog deadline is configurable
- WHEN
PDPP_SESSION_ESTABLISH_WATCHDOG_MSis set to a positive integer - THEN the runtime SHALL use that value as the no-progress deadline
- AND when it is unset the runtime SHALL use a conservative default that clears the legitimate establishment envelope of proven runs
Scenario: Teardown diagnostic capture is bounded
- WHEN the runtime captures a diagnostic page snapshot during teardown of a wedged run
- AND the underlying DOM capture does not resolve within a bounded deadline
- THEN the runtime SHALL abandon that snapshot at the deadline and continue teardown
- AND the diagnostic capture SHALL NOT be able to re-hang the terminal failure or browser release
Requirement: Connectors SHALL support an owner-configured detail-lane run cap as an opt-in, default-off bound
A connector with a serial detail lane SHALL be able to bound a single run by an owner-configured size cap (number of detail fetches per run) and/or time cap (wall-clock the detail phase may spend), and this cap SHALL be opt-in via environment configuration and default off: an unset, empty, non-numeric, or non-positive value SHALL resolve to no cap, and with no cap configured a run SHALL behave exactly as it would without this feature (no cap branch is consulted). A configured cap SHALL only ever cause a run to stop earlier; it SHALL NOT increase concurrency, change pacing, raise a retry budget, or cause a run to fetch more than it otherwise would.
The cap SHALL be run-scoped and shared across every pass of a single run — in particular a detail-gap recovery pass and a forward-walk pass SHALL draw down one shared budget — so that a recovery backlog plus newly listed records are bounded together rather than each pass receiving a fresh budget. A wall-clock cap SHALL be measured from the first time the budget is consulted (the start of the detail phase), not from connector startup.
When a configured cap is reached, the connector SHALL stop launching new detail
fetches and SHALL defer the current and every remaining record as a resumable
DETAIL_GAP, using the same deferral, cursor-commit, and recovery machinery a
source-pressure deferral uses: the hydrated prefix's cursor SHALL commit, the
deferred keys SHALL appear in DETAIL_COVERAGE.gap_keys, and a later run SHALL
recover the deferred records (recovery selecting gaps by stream, not by reason)
and walk forward, so a large history fills in over several bounded runs.
Scenario: No cap configured leaves a run unbounded and unchanged
- WHEN neither the size knob nor the wall-clock knob is set (or both are empty / non-numeric / non-positive)
- THEN the run SHALL resolve to no cap
- AND no cap branch SHALL defer any record
- AND a large backlog SHALL run to completion exactly as it would without the cap feature
Scenario: A configured size cap defers the remaining tail as a resumable gap
- WHEN a detail run is configured with a maximum number of detail fetches per run
- AND the run has hydrated that many record details
- THEN the connector SHALL stop launching new detail fetches
- AND it SHALL defer the current and every remaining record as a resumable
DETAIL_GAP - AND the hydrated prefix's cursor SHALL commit
- AND the deferred keys SHALL appear in
DETAIL_COVERAGE.gap_keys
Scenario: A configured wall-clock cap is bounded by at most one in-flight fetch
- WHEN a detail run is configured with a maximum detail-phase wall-clock
- AND the elapsed detail-phase wall-clock reaches that maximum
- THEN the connector SHALL check the cap between fetches, never interrupting a fetch already in flight
- AND the run MAY exceed the configured wall-clock by at most one in-flight fetch's processing time, itself bounded by the connector's per-fetch timeout
Scenario: One shared budget bounds the recovery pass and the forward pass together
- WHEN a single run performs a detail-gap recovery pass and then a forward-walk pass under a configured cap
- THEN both passes SHALL draw down one shared run-scoped budget
- AND a recovery backlog larger than the cap SHALL cause the forward pass to defer without starting a second budget
Requirement: An owner-configured run-cap deferral SHALL NOT be treated as source pressure
A run-cap deferral SHALL be marked as a self-imposed bound, distinct from a
deferral caused by account/source pressure: a DETAIL_GAP deferred because a run
reached its owner-configured size or time cap is not a source-pressure signal. The
run-cap deferral SHALL carry a resumable wire reason
that is not in the source-pressure reason set (upstream_pressure,
rate_limited), so it SHALL NOT arm the cross-run source-pressure cooldown
governor and SHALL NOT be counted in the source-pressure detail-gap backlog
rollup. The deferral SHALL additionally carry a distinct error class identifying
the configured run cap, so an owner surface can render a self-imposed cap
separately from a busy-service deferral. The run-cap deferral SHALL NOT report an
HTTP failure status, because nothing failed — the run simply stopped at its
budget.
Scenario: A run-cap deferral does not arm the source-pressure cooldown
- WHEN a connector defers records because a run reached its owner-configured cap
- THEN the deferred
DETAIL_GAPreason SHALL NOT be in the source-pressure reason set - AND the deferral SHALL NOT arm the cross-run source-pressure cooldown governor
- AND the deferral SHALL NOT be counted in the source-pressure detail-gap backlog rollup
Scenario: A run-cap deferral is distinguishable from a source-pressure deferral
- WHEN a connector defers records because a run reached its owner-configured cap
- THEN the deferral SHALL carry an error class identifying the configured run cap
- AND that class SHALL be distinct from the class a source-pressure deferral carries
- AND the deferral SHALL NOT report an HTTP failure status
Requirement: Run-cap and generic retry-exhausted deferrals SHALL have distinct, honest end-user copy
The end-user display copy SHALL be distinct for the generic retry-exhausted wire reason and for the configured run-cap error class, and neither SHALL imply that the source service was busy. The generic retry-exhausted reason SHALL read as a retry budget having been used up — applicable to any retry-exhaustion path, not only a configured cap. The run-cap error class SHALL read as a self-imposed per-run budget that saved what it collected and will continue on the next run. Copy that implies source pressure (for example "the service is busy") SHALL be reserved for the source-pressure reasons.
Scenario: Run-cap copy names a self-imposed budget without implying source pressure
- WHEN an owner surface renders the copy for a configured run-cap deferral
- THEN the copy SHALL describe a per-run budget that saved a batch and will continue next run
- AND the copy SHALL NOT imply that the source service was busy or pressured
Scenario: Generic retry-exhausted copy is not specific to a configured cap
- WHEN an owner surface renders the copy for the generic retry-exhausted reason
- THEN the copy SHALL describe a retry budget that was used up
- AND the copy SHALL NOT be byte-identical to the configured run-cap copy
- AND the copy SHALL NOT imply that the source service was busy or pressured
Requirement: A run-cap tail deferral SHALL bound its own foreground materialization
A run-cap trip SHALL bound the foreground work of materializing the deferral
itself when the remaining record tail is larger than an owner-configurable
finite chunk: the connector SHALL write at most the configured chunk of
per-record resumable DETAIL_GAP rows, then fold every older remaining record
into one durable backlog DETAIL_GAP carrying a content-derived list cursor
/ watermark (never a positional offset) for the un-materialized remainder. A run
SHALL NOT spend a long foreground stretch writing one gap row per remaining
record after it has already stopped fetching details.
This chunk SHALL be opt-in and default off: an unset chunk SHALL leave the per-record deferral behavior byte-for-byte unchanged. When only a fetch/time cap is configured (and no explicit chunk), the connector MAY derive a safe finite chunk so an owner who opts into a run cap also gets a bounded tail. The backlog gap SHALL reuse the run-cap deferral contract — a resumable reason outside the source-pressure set and the run-cap error class — so it never arms the source-pressure cooldown and is excluded from the source-pressure backlog rollup.
The deferral SHALL remain resumable and convergent: a later run's recovery SHALL expand the backlog gap by re-listing the parent list at-or-older than the stored inclusive watermark and materializing the next bounded chunk of that window, resolving or rewriting the backlog gap with a new content-derived watermark when remainder exists, and this expansion SHALL run before forward-walk work so the deferred tail recovers first. The inclusive bound SHALL be tie-safe: recovery MAY re-see an already-accounted record sharing the boundary timestamp, but SHALL NOT strand an un-materialized record with that timestamp. A history larger than the chunk SHALL drain over several bounded runs with no record lost and no offset reconstruction; the monotone forward cursor SHALL NOT advance past an unaccounted record (the backlog gap accounts for the older remainder).
Scenario: A cap trip over a large remaining tail writes a bounded number of gap rows
- WHEN a run-cap trips with a configured finite tail-deferral chunk
- AND the remaining record tail is larger than that chunk
- THEN the connector SHALL write at most the chunk of per-record
DETAIL_GAProws - AND it SHALL write exactly one durable backlog
DETAIL_GAPfor the older remainder, carrying a content-derived watermark and not a positional offset - AND the run SHALL NOT write one gap row per remaining record
Scenario: Default-off leaves the tail deferral unchanged
- WHEN no tail-deferral chunk is configured and no fetch/time cap derives one
- THEN a run-cap tail SHALL be materialized one resumable
DETAIL_GAPper record exactly as it would without this bound (no backlog gap is written)
Scenario: A later run expands the backlog gap before forward work and converges
- WHEN a later run is served a backlog
DETAIL_GAP - THEN recovery SHALL re-list the parent list at-or-older than the backlog's inclusive watermark and materialize the next bounded chunk of that window before any forward-walk work
- AND it SHALL resolve the old backlog gap or rewrite it with a new content-derived watermark when remainder exists
- AND it SHALL NOT strand records that share the backlog watermark timestamp
- AND over several bounded runs the older history SHALL fully drain with no record lost and no positional-offset reconstruction
Scenario: A bounded tail deferral is not source pressure
- WHEN a connector folds a run-cap tail into per-record chunk gaps plus a backlog gap
- THEN every such gap SHALL carry a resumable reason outside the source-pressure reason set and the run-cap error class
- AND none of them SHALL arm the source-pressure cooldown governor or be counted in the source-pressure detail-gap backlog rollup
Requirement: A provider request path SHALL have exactly one pre-flight send governor
SHALL the polyfill-runtime gate the velocity of requests to a single provider through exactly ONE pre-flight send governor. The send governor is the only component permitted to wait (sleep) before a request is transmitted. Either a concurrency governor (AIMD lane) or a rate governor (GCRA/token-bucket) MAY be the send governor for a given provider, but NOT both as independent pre-flight gates. For unknown-quota providers the runtime SHALL prefer the self-calibrating concurrency governor; a GCRA rate signal, when present, SHALL be folded into the single governor's pre-flight wait as a delay input, NOT run as a second independent pre-flight wait.
Run-control decision layers — the run budget (request/wall-clock cap), the retry budget, and the circuit breaker — SHALL make synchronous admit/deny decisions and SHALL NOT perform a pre-flight wait. Retry backoff SHALL fire only after a failed send (post-failure), never inside the same pre-flight wait as the send governor.
Scenario: One pre-flight wait source per admitted request
- WHEN a request to a provider is admitted and transmitted
- THEN exactly one pre-flight wait source SHALL have governed it (the single send governor)
- AND no decision layer (run budget, retry budget, circuit breaker) SHALL have added a second pre-flight wait
Scenario: GCRA pacing contributes a signal, not a second gate
- WHEN a provider has both an AIMD concurrency send governor and a GCRA pacing bucket configured
- THEN the GCRA pacing SHALL contribute its computed inter-request delay to the single send governor's pre-flight wait
- AND the effective pre-flight wait SHALL be the maximum of the governor's own delay and the pacing delay, NEVER their sum
- AND the GCRA pacing SHALL NOT perform its own pre-flight wait
Scenario: Two independent pre-flight gates is a spec violation
- WHEN a request path is composed such that both a concurrency governor and a rate governor independently wait before the same provider send
- THEN the composition SHALL be treated as a defect
- AND the two pre-flight waits SHALL be detectable as more than one wait source on the request path
- AND the runtime SHALL NOT ship a default configuration in which two pre-flight waits gate the same provider request
Requirement: Retry-After SHALL be honored exactly without double-paying the wait
SHALL the polyfill-runtime, when a provider returns a throttle response carrying
a Retry-After header, wait the specified interval exactly once before
retrying. The runtime SHALL NOT add jittered backoff on top of the Retry-After
interval for that retry, and SHALL NOT also queue the same interval as a
pre-flight pacing wait on the next request. A throttle response MAY decrease the
send governor's fill rate (multiplicative decrease signal), but the
Retry-After interval itself SHALL be slept exactly once, in the retry layer.
Scenario: Retry-After is slept once, not stacked on backoff
- WHEN a request receives a retryable response with a
Retry-Afterheader - THEN the runtime SHALL wait exactly the
Retry-Afterinterval before the retry - AND it SHALL NOT add jittered exponential backoff on top of that interval
- AND it SHALL NOT re-impose the same interval as a pre-flight pacing wait on the subsequent request
Scenario: Throttle still feeds the fill-rate decrease signal
- WHEN a
Retry-Afterthrottle is observed and slept in the retry layer - THEN the send governor's pacing fill rate MAY be decreased (one-way error ratchet) as a signal
- BUT the decrease SHALL NOT cause the slept
Retry-Afterinterval to be paid a second time
Requirement: The retry layer SHALL bound retry volume with a ratio-based retry budget distinct from per-request attempts
SHALL the polyfill-runtime's shared retry helper accept an optional ratio-based retry budget (a Finagle-style token bucket) that bounds total retry volume across a run, distinct from and in addition to the per-request attempt count. When a retry budget is configured and its tokens are exhausted, the retry helper SHALL stop retrying immediately with the same terminal shape as exhausting the per-request attempt count, so the run defers rather than spins. When no retry budget is configured, only the per-request attempt count bounds retries (prior behavior preserved). A retry-budget-driven stop SHALL carry a reason that is NOT in the source-pressure reason set.
Scenario: Retry budget exhaustion stops retries before the attempt count
- WHEN a retry budget with capacity smaller than the per-request attempt count is configured
- AND a request keeps receiving retryable responses
- THEN the retry helper SHALL stop retrying once the retry budget is empty, before exhausting the per-request attempt count
- AND the terminal error SHALL be the same shape as attempt-count exhaustion
Scenario: No retry budget configured preserves attempt-count-only behavior
- WHEN no retry budget is configured on the retry helper
- THEN only the per-request attempt count SHALL bound retries
- AND the helper's behavior SHALL be unchanged from before a retry budget was available
Requirement: 429-prone connectors SHALL route provider requests through the shared send governor and retry layer
SHALL provider connectors that previously hand-rolled if (status === 429) throw "<name>_rate_limited" route their provider requests through the shared
send-governor + retry helper instead of growing local rate-handling code. The
shared helper SHALL preserve each connector's terminal rate-limit error string
so the runtime retryablePattern cross-run source-pressure deferral and
cooldown contract is unchanged. A connector MAY configure the helper with a
single bounded attempt so its immediate-throw behavior is byte-identical while
the Retry-After-honor capability is wired and available behind that configured
attempt count.
Scenario: Terminal rate-limit preserves the cross-run cooldown contract
- WHEN a migrated connector exhausts its retries against a 429
- THEN the shared helper SHALL throw the connector's existing
<name>_rate_limitedterminal error - AND that error SHALL match the connector's
retryablePattern - AND the cross-run source-pressure cooldown SHALL arm exactly as it did before the migration
Scenario: A single bounded attempt preserves immediate-throw behavior
- WHEN a migrated connector configures the shared helper with one bounded attempt
- AND a provider returns 429
- THEN the helper SHALL make exactly one provider call and throw the terminal rate-limit error immediately
- AND raising the attempt count SHALL activate inline Retry-After honor and bounded backoff without changing the terminal contract
Requirement: Budget-exhaustion defer reasons SHALL be disjoint from source-pressure reasons
SHALL every reason with which the shared provider-budget controller defers a run (request-cap reached, wall-clock deadline, retry-budget exhausted, circuit open) be disjoint from the source-pressure reason set that arms the cross-run cooldown governor. Budget exhaustion is a planned stop, not a provider-driven rejection, and SHALL NOT be misread as source pressure by the scheduler.
Scenario: No budget-exhaustion reason arms the source-pressure cooldown
- WHEN a run defers because a provider-budget axis is exhausted (request cap, wall-clock, retry budget, or open circuit)
- THEN the defer reason SHALL NOT be a member of the source-pressure reason set
- AND the cross-run source-pressure cooldown governor SHALL NOT be armed by that deferral
Requirement: The shared connector HTTP governor SHALL provide adaptive, fastest-safe collection by default
The shared API-connector HTTP governor (createConnectorHttpGovernor) SHALL,
when constructed with only a connector name, yield an adaptive rate controller:
it SHALL enter from a conservative slow-start discovery interval, accelerate
under sustained success (AIMD additive increase toward the rate ceiling), and
back off multiplicatively on a throttle signal — never crossing the
owner-authored rate ceiling. A connector author SHALL obtain this behavior with
no per-connector rate code beyond the bare factory call. The factory SHALL also
provide an explicit opt-out (a zero discovery interval) that disables pacing
entirely and preserves the pre-convergence byte-identical no-wait path.
Scenario: A bare governor cold-starts adaptive
- WHEN a connector constructs the governor with only its name
- THEN the governor SHALL cold-start at the shared conservative discovery interval
- AND its live rate snapshot SHALL be available (pacing is on by default)
Scenario: Sustained success accelerates the rate toward the ceiling
- WHEN the governor records a sequence of successful responses
- THEN the inter-request interval SHALL monotonically shrink (the rate rises)
- AND it SHALL never shrink below the rate ceiling
Scenario: A throttle backs the rate off and the back-off is legible
- WHEN the governor records a throttle signal
- THEN the inter-request interval SHALL increase (the rate slows)
- AND the back-off SHALL be visible in the governor's rate snapshot as a legible event with its reason
Scenario: A connector opts out of pacing
- WHEN a connector constructs the governor with a zero discovery interval
- THEN the governor SHALL perform no pre-flight pacing wait
- AND its rate snapshot SHALL be absent (no adaptive controller exists)
Requirement: The shared governor SHALL expose a warm-start runtime seam so the learned rate compounds across runs
The shared governor SHALL accept a restored learned interval at construction (seeding the controller warm-started, clamped to never be faster than the rate ceiling) and SHALL expose a snapshot of its learned interval for persistence. The runtime SHALL provide framework-owned helpers — restore (applying a staleness guard), persist (durable state fields), and observability — so a connector author threads only its durable state location and never hand-rolls the read/write or the staleness logic. Warm-start state SHALL be persisted onto a declared stream cursor (the runtime gates connector STATE on declared streams); a connector SHALL NOT persist warm-start state under a synthetic, undeclared stream.
Scenario: A fresh resume restores the prior run's learned interval
- WHEN a run persists its learned interval and the next run restores it within the staleness window
- THEN the next run's controller SHALL warm-start FROM the restored interval, not the cold discovery seed
Scenario: A stale resume cold-starts conservatively
- WHEN a persisted learned interval is older than the staleness guard, or is absent or malformed
- THEN the restore SHALL yield nothing and the controller SHALL cold-start at the conservative discovery interval
Scenario: Warm-start state rides a declared stream cursor
- WHEN a connector persists its learned interval for warm-start
- THEN it SHALL merge the pacing fields onto an already-declared stream's cursor
- AND it SHALL NOT emit STATE for a synthetic stream the run never declared
Requirement: The adaptive controller's live rate SHALL be legible for every governor-using connector
Any connector using the shared governor SHALL be able to emit its controller's
live rate as the redacted collection_rate run-trace progress via a single
framework-owned helper, so an operator can watch the controller speed up and back
off. The emitted rate state SHALL carry no account or content data — only rate
numbers (current and ceiling interval / effective rate) and the last back-off
reason. When pacing is opted out, the helper SHALL yield an explicit absence
rather than a false zero rate.
Scenario: Rate state is emitted as redacted progress
- WHEN a governor-using connector surfaces its controller state
- THEN the emitted
collection_rateSHALL carry the current and ceiling interval, the corresponding rates per minute, and the last back-off reason - AND it SHALL carry no account/content fields
Scenario: Absent controller reads as honest unknown
- WHEN the connector has opted out of pacing
- THEN the observability helper SHALL yield an explicit absence
- AND it SHALL NOT emit a false zero rate