Reference Implementation Notes
Current implementation behavior for the forkable PDPP reference stack. Not normative protocol documentation.
These are implementation notes for the current reference-implementation/ package. The public explainer and run/deploy entrypoint is /reference. For protocol semantics, use the protocol docs under /docs.
The reference-implementation/ package is the forkable PDPP reference substrate in this repo. It is where the current authorization server, resource server, runtime, CLI, and black-box tests exercise the protocol.
For repo-level orientation, start with the root README.md. For runnable package details, see reference-implementation/README.md.
Current topology
The live reference implementation is organized around four first-class actors:
- Northstar HR: the native PDPP provider path
- Personal-server polyfill path: the connector/runtime realization for collected sources
- Longview: the reference client application
- Reference operator CLI: the repo-local owner/debug consumer
- Public PDPP CLI: the installable client/agent connect consumer
Those actors share one engine substrate but expose two different source-realization models:
- Native provider: public requests identify the source with
source: { kind: "provider_native", id } - Polyfill source: public requests identify the source with
source: { kind: "connector", id }
Primary surfaces
Provider discovery
The current provider-connect story starts with standards-based discovery:
GET /.well-known/oauth-protected-resourceGET /.well-known/oauth-authorization-server
The protected-resource metadata also includes advisory agent discovery at
pdpp_agent_discovery. Its CLI command is generated from the published package
metadata:
npx -y @pdpp/cli connect <provider-url>This remains prelaunch 0.x software, but pdpp_agent_discovery.cli.no_owner_token is
true when the reference AS token-completion path supports owner-approved
scoped handoff without an owner bearer token. Treat the command as the
no-owner-token connect flow while that metadata flag is true.
The reference dashboard also shows reference operator diagnostics such as
pdpp ref run timeline <run-id>, pdpp ref grant timeline <grant-id>, and
pdpp ref trace show <trace-id>. These use the pdpp ref namespace from the
same @pdpp/cli package (npx -y @pdpp/cli --help) and inspect _ref
operator routes for a running reference deployment. When placeholder owner auth
is enabled, set PDPP_OWNER_SESSION_COOKIE to a valid pdpp_owner_session
cookie before using those commands.
The authorization-server metadata truthfully advertises:
pushed_authorization_request_endpointregistration_endpointdevice_authorization_endpointtoken_endpointintrospection_endpointagent_connect_endpointpdpp_registration_modes_supportedpdpp_pre_registered_public_clientspdpp_authorization_details_types_supported
The same metadata intentionally does not advertise a full generic OAuth authorization-code client-connect surface yet. In the live reference today, there is still no published authorization_endpoint, no published response_types_supported, and no published PKCE/browser redirect flow.
Client request start
Client requests are staged through:
POST /oauth/par
The live reference uses PAR to persist the RFC 9396 authorization_details request, then sends the user through the reference consent shell. Approval returns the grant and client bearer token directly. That direct-token return is a reference shortcut; it is not a generic OAuth authorization-code redirect profile.
Client registration
The current reference also supports public-client self-registration:
POST /oauth/register
It is intentionally narrow:
- public-client metadata only (
token_endpoint_auth_method: "none") - no initial access token required for the public path
- optional initial-access tokens remain available for operator/bootstrap use
- registration creates a public
client_idonly; data access still requires owner-approved consent - meant to coexist with the pre-registered client path as fallback and examples
The current reference contract expects a single RFC 9396 authorization_details entry of type https://pdpp.org/data-access.
Consent and grant issuance
The staged request is reviewed through:
GET /consent?request_uri=...POST /consent/approvePOST /consent/deny
The current reference approval surface returns the issued grant and client bearer token directly. It is a deliberate reference shortcut, not a full generic authorization-code profile.
Owner self-export
Owner login is a separate device flow:
POST /oauth/device_authorizationGET /devicePOST /device/approvePOST /oauth/token
That flow yields an owner bearer token for self-export and direct owner queries.
Trusted local owner agents use a separate reference onboarding path rather than
copying raw owner bearers into chat. Start from the resource-server entrypoint
with pdpp owner-agent onboard <entrypoint> --credential-file ~/applications/daisy/.pi/agent/pdpp-owner-agent.json,
approve in the browser, and verify with pdpp owner-agent status. This is
reference REST/control-plane behavior; ordinary agents should use scoped grants
or grant-scoped MCP.
Error envelopes
OAuth authorization-server endpoints keep RFC-shaped error bodies:
{
"error": "invalid_request",
"error_description": "client_id is required",
"request_id": "req_..."
}The reference adds request_id and a matching Request-Id header for debugging. Resource-server and PDPP-native endpoints continue to use the nested PDPP error envelope with error.type, error.code, error.message, and error.request_id.
Semantic retrieval diagnostics
The reference implements experimental semantic retrieval as a reference feature,
not as core PDPP. Local development uses a server-owned embedding profile; the
default operational profile is minilm, backed by Xenova/all-MiniLM-L6-v2
through Transformers.js. Operators can switch to multilingual-minilm for
Italian or mixed-language data without adding public model= or embedding=
request parameters.
The operator surface at /dashboard/deployment shows the active semantic
backend, model, dimensions, distance metric, language bias, vector-index kind,
index state, model-cache state, and every participating
(connector, stream, field) tuple. It is the first place to check when semantic
search returns no hits: zero participation, disabled downloads, stale indexes,
and background rebuilds are visible there without reading logs.
Resource server queries
Clients and owners both query the resource server through /v1.
The main distinction is source realization:
- Native provider mode: owner reads and client grant reads use
source.kind = "provider_native" - Polyfill mode: owner reads and client grant reads use
source.kind = "connector", because the source identity is connector-scoped
In the current reference, successful and route-level rejected /v1/streams, /v1/streams/:stream, /v1/streams/:stream/records, and /v1/streams/:stream/records/:id responses also expose:
Request-IdPDPP-Reference-Trace-IdPDPP-Reference-Revision
Request-Id and PDPP-Reference-Trace-Id are reference-only correlation aids. They let a caller jump from a live read response to the existing GET /_ref/traces/:traceId reader without adding a broader trace-listing surface.
PDPP-Reference-Revision is reference implementation metadata, not protocol negotiation. It is emitted by the authorization server, resource server, composed proxy-visible routes, and _ref surfaces so operators can tell which reference build is running without overloading the protocol PDPP-Version header. The value uses PDPP_REFERENCE_REVISION when set, otherwise the package version plus git revision when available, and falls back to an unknown revision when build metadata is not available.
Reference-only introspection and traces
The implementation also exposes narrow reference-only surfaces for debugging and replay:
GET /_ref/traces/:traceIdGET /_ref/grants/:grantId/timelineGET /_ref/runs/:runId/timeline
These are intentionally reference artifacts. They are not part of the core PDPP protocol, and the dashboard that renders them is an operator surface for a running local or self-hosted instance.
What has been intentionally removed
The current reference no longer relies on these older helper seams:
POST /grants/initiateGET /consent/:deviceCodePOST /consent/:deviceCode/approvePOST /consent/:deviceCode/denyPOST /owner-tokenPOST /grants/:grantId/tokens
If you see those mentioned in archival notes, treat them as historical context, not live contract.
Why this split exists
The reference is trying to prove one specific architectural point:
- PDPP core should not care whether data arrived from a native provider, a browser-automation connector, a file import, or some later collection mechanism.
- Public source identity still needs to be honest.
That is why the same engine supports both:
source: { kind: "provider_native", id: "northstar_hr" }for native sources such as Northstar HRsource: { kind: "connector", id: "https://registry.pdpp.org/connectors/spotify" }for collected/polyfill sources such as Spotify
What is still intentionally thin
The current reference is strong enough to fork and evaluate today. The remaining deliberate gaps are about scope control, not about whether a real substrate exists.
Notably:
- the provider-connect profile is still thin and intentionally conservative
- the current metadata proves request staging, protected DCR, and owner self-export, not a complete third-party authorization-code ecosystem profile
- the public website explains the reference but does not define its primary contract
- the dashboard is a live-instance operator surface, not a hosted canonical PDPP demo
The most trustworthy description of the live system remains:
- root PDPP specs for protocol semantics
reference-implementation/code and tests for current implementation behavior- OpenSpec change/spec artifacts for project-level planning and boundaries