Reference Implementation Notes
Current implementation behavior for the forkable PDPP reference stack. Not normative protocol documentation.
These are implementation notes for the current reference-implementation/ package. The public explainer and run/deploy entrypoint is /reference. For protocol semantics, use the protocol docs under /docs.
The reference-implementation/ package is the forkable PDPP reference substrate in this repo. It is where the current authorization server, resource server, runtime, CLI, and black-box tests exercise the protocol.
For repo-level orientation, start with the root README.md. For runnable package details, see reference-implementation/README.md.
Current topology
The live reference implementation is organized around four first-class actors:
- Northstar HR: the native PDPP provider path
- Personal-server polyfill path: the connector/runtime realization for collected sources
- Longview: the reference client application
- PDPP CLI: the owner/debug consumer
Those actors share one engine substrate but expose two different source-realization models:
- Native provider: public requests identify the source with
provider_id - Polyfill source: public requests identify the source with
connector_id
Primary surfaces
Provider discovery
The current provider-connect story starts with standards-based discovery:
GET /.well-known/oauth-protected-resourceGET /.well-known/oauth-authorization-server
The authorization-server metadata truthfully advertises:
pushed_authorization_request_endpointregistration_endpointdevice_authorization_endpointtoken_endpointintrospection_endpointpdpp_registration_modes_supportedpdpp_authorization_details_types_supported
The same metadata intentionally does not advertise a full generic OAuth authorization-code client-connect surface yet. In the live reference today, there is still no published authorization_endpoint, no published response_types_supported, and no published PKCE/browser redirect flow.
Client request start
Client requests are staged through:
POST /oauth/par
Client registration
The current reference also supports a protected dynamic registration path:
POST /oauth/register
It is intentionally narrow:
- public-client metadata only (
token_endpoint_auth_method: "none") - protected by an initial access token
- meant to coexist with the pre-registered client path, not replace it
The current reference contract expects a single RFC 9396 authorization_details entry of type https://pdpp.org/data-access.
Consent and grant issuance
The staged request is reviewed through:
GET /consent?request_uri=...POST /consent/approvePOST /consent/deny
The current reference approval surface returns the issued grant and client bearer token directly. It is a deliberate reference shortcut, not a full generic authorization-code profile.
Owner self-export
Owner login is a separate device flow:
POST /oauth/device_authorizationGET /devicePOST /device/approvePOST /oauth/token
That flow yields an owner bearer token for self-export and direct owner queries.
Semantic retrieval diagnostics
The reference implements experimental semantic retrieval as a reference feature,
not as core PDPP. Local development uses a server-owned embedding profile; the
default operational profile is minilm, backed by Xenova/all-MiniLM-L6-v2
through Transformers.js. Operators can switch to multilingual-minilm for
Italian or mixed-language data without adding public model= or embedding=
request parameters.
The operator surface at /dashboard/deployment shows the active semantic
backend, model, dimensions, distance metric, language bias, vector-index kind,
index state, model-cache state, and every participating
(connector, stream, field) tuple. It is the first place to check when semantic
search returns no hits: zero participation, disabled downloads, stale indexes,
and background rebuilds are visible there without reading logs.
Resource server queries
Clients and owners both query the resource server through /v1.
The main distinction is source realization:
- Native provider mode: no public
connector_idis required for owner reads or client grant reads - Polyfill mode: owner reads still require
connector_id, because the source identity is connector-scoped
In the current reference, successful and route-level rejected /v1/streams, /v1/streams/:stream, /v1/streams/:stream/records, and /v1/streams/:stream/records/:id responses also expose:
Request-IdPDPP-Reference-Trace-IdPDPP-Reference-Revision
Request-Id and PDPP-Reference-Trace-Id are reference-only correlation aids. They let a caller jump from a live read response to the existing GET /_ref/traces/:traceId reader without adding a broader trace-listing surface.
PDPP-Reference-Revision is reference implementation metadata, not protocol negotiation. It is emitted by the authorization server, resource server, composed proxy-visible routes, and _ref surfaces so operators can tell which reference build is running without overloading the protocol PDPP-Version header. The value uses PDPP_REFERENCE_REVISION when set, otherwise the package version plus git revision when available, and falls back to an unknown revision when build metadata is not available.
Reference-only introspection and traces
The implementation also exposes narrow reference-only surfaces for debugging and replay:
GET /_ref/traces/:traceIdGET /_ref/grants/:grantId/timelineGET /_ref/runs/:runId/timeline
These are intentionally reference artifacts. They are not part of the core PDPP protocol, and the dashboard that renders them is an operator surface for a running local or self-hosted instance.
What has been intentionally removed
The current reference no longer relies on these older helper seams:
POST /grants/initiateGET /consent/:deviceCodePOST /consent/:deviceCode/approvePOST /consent/:deviceCode/denyPOST /owner-tokenPOST /grants/:grantId/tokens
If you see those mentioned in archival notes, treat them as historical context, not live contract.
Why this split exists
The reference is trying to prove one specific architectural point:
- PDPP core should not care whether data arrived from a native provider, a browser-automation connector, a file import, or some later collection mechanism.
- Public source identity still needs to be honest.
That is why the same engine supports both:
provider_idfor native sources such as Northstar HRconnector_idfor collected/polyfill sources such as Spotify
What is still intentionally thin
The current reference is strong enough to fork and evaluate today. The remaining deliberate gaps are about scope control, not about whether a real substrate exists.
Notably:
- the provider-connect profile is still thin and intentionally conservative
- the current metadata proves request staging, protected DCR, and owner self-export, not a complete third-party authorization-code ecosystem profile
- the public website explains the reference but does not define its primary contract
- the dashboard is a live-instance operator surface, not a hosted canonical PDPP demo
The most trustworthy description of the live system remains:
- root PDPP specs for protocol semantics
reference-implementation/code and tests for current implementation behavior- OpenSpec change/spec artifacts for project-level planning and boundaries