Lexical Retrieval Extension
Optional PDPP extension defining the public lexical retrieval surface at GET /v1/search.
Overview
The lexical retrieval extension defines a small, optional, discoverable, grant-safe public surface that lets applications and agents search records by text across the streams a caller is authorized to read. It is not part of core PDPP: implementations MAY expose it, and clients MUST NOT assume it exists unless the resource server explicitly advertises it (see Discovery).
The extension is intentionally lexical-only in v1. It does not expose semantic / vector retrieval, embeddings, body-DSL POST /v1/search, portable relevance calibration, or connector-specific search semantics — those are out of scope. See Non-goals.
For the long-form contract, see the canonical spec at openspec/changes/add-lexical-retrieval-extension/specs/lexical-retrieval/spec.md in the repo. This page is the developer-facing companion.
Authentication and Versioning
Same as the Data Query API:
Authorization: Bearer <access_token>
PDPP-Version: 2026-03-28Both client tokens (third-party apps holding a grant) and owner tokens (the resource owner performing self-export) are accepted. Per-mode behavior differs (see Owner-mode semantics) but the request shape is identical.
Request-Id is echoed in the response.
Endpoint
GET /v1/searchA dedicated cross-stream search endpoint. The reference's _ref/search is a separate, reference-only operator-jump helper for traces / grants / runs and is not the public lexical retrieval surface — the two share neither shape nor backing.
Query parameters
| Parameter | Type | Description |
|---|---|---|
q | string | Required. The lexical query. |
limit | integer | Page size. Default 25, max 100. |
cursor | string | Opaque pagination cursor from a previous response's next_cursor. Search cursors are not interchangeable with record-list cursors and not with changes_since values. |
streams[] | string (repeated) | Optional stream-scope narrowing. Omit to search every authorized stream that participates in the extension. See Owner-mode semantics for the per-mode meaning. |
Anything else is rejected with invalid_request_error. In particular:
connector_idis not a public parameter on this surface in v1. Owner-mode search fans out across all owner-visible connectors internally; each result carries the originatingconnector_idso clients can hydrate.filter[…],fields,expand[],expand_limit[…],order=,rank=,boost=, embedding/vector/semantic params, and connector-specific semantics are explicitly out of scope.
Result shape
{
"object": "list",
"url": "/v1/search",
"has_more": true,
"next_cursor": "<opaque>",
"data": [
{
"object": "search_result",
"stream": "messages",
"record_key": "msg_123",
"connector_id": "https://registry.pdpp.org/connectors/messaging-app",
"record_url": "/v1/streams/messages/records/msg_123",
"emitted_at": "2026-04-23T12:34:56Z",
"score": { "kind": "bm25", "value": -0.42, "order": "lower_is_better" },
"matched_fields": ["text"],
"snippet": { "field": "text", "text": "…overdraft fee…" }
}
]
}Required fields on every result
object: "search_result"streamrecord_keyconnector_id— required because the resource server scopes owner reads per connector. Even client-token callers receiveconnector_id(it mirrors the connector identity already encoded in the grant).emitted_atscore— when advertised, a typed implementation-relative lexical score. The reference emits{ "kind": "bm25", "order": "lower_is_better" }using SQLite FTS5 BM25 values.matched_fields— a non-empty subset of the stream's declaredquery.search.lexical_fieldsintersected with the caller's authorized fields.
Optional fields
record_url— when present, resolves to the canonicalGET /v1/streams/{stream}/records/{record_key}endpoint. For owner-token callers on a per-connector resource server (the reference today), the URL includes?connector_id=<canonical>.snippet— a{ field, text }pair drawn from amatched_fieldsentry. Implementations MAY omitsnippetper result. Snippet text never quotes ungranted field content — see Grant safety.
What is intentionally limited
- No portable relevance calibration. Scores are implementation-relative. Clients may use the advertised
kindandorder, but MUST NOT compare values across servers or implementation changes unless a later capability advertises stronger calibration. - No hydrated record payload. The extension returns candidate references; clients use the existing single-record read endpoint (or the
record_url) to hydrate.
Grant safety
For caller C and grant G, the extension searches only over (stream, field) pairs where:
streamis inG,fieldis readable underG's effective field projection forstream, ANDstreamdeclaresfieldin itsquery.search.lexical_fields.
Concretely:
- Streams outside the grant contribute zero hits.
- Fields outside the grant projection are never searched for the caller (no "filter-later" pattern).
matched_fieldsis a non-empty subset of the searchable ∩ authorized intersection.snippet.textcontains only substrings drawn from that intersection.- A stream whose searchable ∩ authorized intersection is empty contributes zero hits, and the response does not signal a per-stream error for that case.
Errors
Same error envelope as the Data Query API.
| Code | HTTP | When |
|---|---|---|
invalid_request | 400 | Missing q, unsupported v1 parameter (e.g. connector_id, filter[…], rank), or streams[] required because the server's advertisement reports cross_stream: false. |
grant_stream_not_allowed | 403 | Client tokens only. A streams[] entry names a stream not in the grant. |
invalid_cursor | 410 | Cursor refers to an expired or unknown snapshot. |
Owner-token streams[] is not a hard authorization check — naming a stream that no owner-visible connector exposes simply yields zero hits.
Owner-mode semantics
The reference implementation (and other resource servers that scope owner reads per connector) handles owner-token search as follows:
- The request shape is identical to client-token search. There is no public
connector_idparameter in v1. - The server fans out across every owner-visible connector internally and merges results.
streams[]is a soft filter: it narrows to a stream name shared across owner-visible connectors. Naming a stream that no owner-visible connector exposes yields zero hits, not an error.- Each
search_resultcarriesconnector_idso the caller can hydrate each hit through the correct per-connector owner read scope. record_url, when emitted, includes?connector_id=<canonical>so a plain GET against the URL hits the correct per-connector scope.
For client tokens, search is naturally scoped to the connector encoded in the grant; connector_id on results mirrors that grant identity.
Discovery
Server-level: extension advertisement
The extension advertises itself in the resource-server metadata document (RFC 9728) under a capabilities.lexical_retrieval block:
{
"resource": "https://example.com",
"...": "...",
"capabilities": {
"lexical_retrieval": {
"supported": true,
"endpoint": "/v1/search",
"cross_stream": true,
"snippets": true,
"default_limit": 25,
"max_limit": 100,
"score": {
"supported": true,
"kind": "bm25",
"order": "lower_is_better",
"value_semantics": "implementation_relative"
}
}
}
}When supported: true, all six base keys (supported, endpoint, cross_stream, snippets, default_limit, max_limit) are required. When score.supported: true, each result includes the typed score object. The advertisement is reachable without a bearer token.
A resource server that does not expose the extension SHALL omit capabilities.lexical_retrieval entirely or set supported: false. Clients MUST NOT assume /v1/search is available unless the advertisement says so.
Stream-level: query.search.lexical_fields
Each participating stream declares its searchable fields in its existing per-stream metadata (GET /v1/streams/{stream}):
{
"object": "stream_metadata",
"name": "posts",
"query": {
"search": {
"lexical_fields": ["title", "selftext"]
}
}
}v1 accepts only top-level scalar string fields declared in the stream's schema.properties. Nested paths, arrays, blob references, and unknown fields are rejected by the manifest validator. A stream that does not participate in lexical retrieval SHALL omit query.search entirely (there is no "search-aware but searches nothing" form).
The advertisement does not enumerate per-stream fields; clients discover them through the existing stream-metadata endpoint.
Pagination
?cursor=<opaque>Pagination is opaque. Cursors are not interchangeable with record-list (/v1/streams/.../records?cursor=…) or changes_since cursors. Within a single search session (same q, same streams[], same grant) cursoring is stable enough to avoid duplication and infinite loops; across server restart, snapshot expiry, or grant change the cursor MAY return invalid_cursor and the client recovers by issuing a fresh search.
The cursor format is implementation-defined — clients MUST treat it as opaque.
Ranking
Results are returned in relevance-oriented order. Higher-positioned results SHOULD generally be more relevant than lower-positioned results. The advertised BM25 score is implementation-relative and uses order: "lower_is_better" in the reference. The extension intentionally does not define portable score calibration, semantic reranking, recency blending, or per-connector custom weighting in v1.
Non-goals
Out of scope for v1; future extensions or revisions may address them separately:
- Semantic / vector retrieval.
- Embeddings or embedding versioning.
- Cross-connector entity resolution.
- Generic boolean / predicate query DSL.
- Connector-specific search semantics on the public surface.
- Portable score calibration.
- A
POST /v1/searchbody-DSL surface (reserved as a possible future extension). - Mandatory promotion of this extension to core PDPP.
See also
- Data Query API — the core record-read contract this extension complements.
- Semantic Retrieval Extension (Experimental) — a sibling experimental extension at
GET /v1/search/semantic. Unstable; use lexical retrieval when stability matters. - Approved spec:
openspec/changes/add-lexical-retrieval-extension/specs/lexical-retrieval/spec.md. - Implementation tranche:
openspec/changes/implement-lexical-retrieval-extension/.