feat(gateway): make resources/prompts reachable end-to-end (#669, #670, #672, #673) by dgenio · Pull Request #701 · dgenio/contextweaver

dgenio · 2026-06-15T07:52:34Z

Summary

PR #700 landed the resource/prompt gateway runtime (PrimitiveGatewayRuntime), the four meta-tools, the converters, and the mixed-primitive benchmark — but the feature was library-only: there was no shipped PrimitiveUpstream adapter, and contextweaver mcp serve never constructed a primitive runtime, so resources/prompts were unreachable from the CLI. This PR closes the remaining gaps so the #555 "shape all three MCP primitives" capability is usable end-to-end.

Advances #555. Addresses #669, #670, #672, #673.

Changes

adapters/mcp_primitive_upstream.py (new) — ships the concrete PrimitiveUpstream trio mirroring the tool mcp_upstream adapters: StubPrimitiveUpstream (in-process; tests/CLI/air-gapped CI), McpClientPrimitiveUpstream (wraps a connected MCP ClientSession), MultiplexPrimitiveUpstream (multi-server fan-out). Per the PrimitiveUpstream contract these raise transport errors so the runtime classifies them via classify_upstream_exception (unlike the tool path, which returns isError).
_mcp_cli.py — mcp serve --gateway now builds a PrimitiveGatewayRuntime from a snapshot catalog's optional resources / prompts lists (sharing the tool runtime's ContextManager) and passes it to McpGatewayServer(primitive_runtime=…). Tools-only catalogs are unchanged (primitives off); proxy mode stays a transparent passthrough. Factored a shared _parse_catalog_file helper; the serve summary reports primitive counts.
adapters/gateway_primitives.py — added resource_ids() / prompt_ids() accessors mirroring ProxyRuntime.list_tool_ids().
Makefile — new benchmark-primitives target for benchmarks/primitive_gateway_benchmark.py (Regression benchmark for tools+resources+prompts context shaping #673).
docs/gateway_spec.md — added §9.4 (request flows for the four verbs, firewall/error-taxonomy semantics) and §9.5 (serve + snapshot-catalog wiring) (Gateway spec updates for resources/prompts coverage #672).
Tests — test_mcp_primitive_upstream.py (all three adapters, incl. timeout-propagation and multiplex routing/collision) + serve-CLI wiring tests in test_mcp_serve_cli.py.

Checklist

Tests added or updated for every new/changed public function
make ci passes locally — fmt, lint, type, drift-check, module-size-check, doc-snippets-check, readme-version-check, example, demo all green; full test suite 2669 passed / 31 skipped / 1 xfailed. (See note below on the one failure.)
CHANGELOG.md updated under ## [Unreleased]
Docstrings added for all new public APIs (Google-style)
Public-API change? make api-check clean — the new primitive surface is reached by module path (matching feat(routing): unified cross-primitive identity & collision policy (#671) #700's treatment), so api/public_api.txt is unchanged.
Every modified module stays ≤ 300 lines (make module-size-check OK; the new upstream adapters live in their own module rather than growing mcp_upstream.py).
Related issues linked in the summary above
Agent-facing docs updated — AGENTS.md module map gains the new module + the resource_ids()/prompt_ids() note.

Notes for reviewers

Pre-existing test failure (not from this PR): test_mcp_serve_cli.py::test_serve_dry_run_writes_catalog_diagnostic_event asserts empty stderr but the sandbox has no network, so tiktoken logs a cl100k_base 403 fallback warning. Verified identical failure on a clean tree (git stash); it passes with network / pre-cached tiktoken. This is the same known-environmental case documented in PR feat(routing): surface routing diagnostics and validation (#519, #521, #523, #524, #538) #567.
Catalog format: resources/prompts are optional siblings of tools in a snapshot object ({"tools": […], "resources": […], "prompts": […]}) — documented in §9.5. A bare-list (tools-only) catalog is unaffected.
McpClientPrimitiveUpstream is a thin pass-through over a ClientSession; it's exercised here with a fake session. Live-session integration is covered by the existing record/replay direction, not added here.

Reproducibility

make benchmark-primitives → mixed-primitive gateway benchmark: overall savings 84.1% (240/1508 tokens); resource ×60 = 86.5% savings, recall@k=1.0; prompt ×40 = 80.1% savings, recall@k=1.0. No routing/scoring/tokenisation core was changed.

https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng

Generated by Claude Code

#672, #673) PR #700 landed the resource/prompt gateway runtime, meta-tools, converters, and benchmark, but the feature was library-only: there was no shipped PrimitiveUpstream adapter and `contextweaver mcp serve` never wired a primitive runtime. This closes those gaps. - adapters/mcp_primitive_upstream.py: ship StubPrimitiveUpstream, McpClientPrimitiveUpstream, and MultiplexPrimitiveUpstream, mirroring the tool mcp_upstream trio. Per the PrimitiveUpstream contract these raise transport errors so the runtime classifies them via classify_upstream_exception. - _mcp_cli.py: `mcp serve --gateway` now builds a PrimitiveGatewayRuntime from a snapshot catalog's optional `resources` / `prompts` lists (sharing the tool runtime's ContextManager) and passes it to McpGatewayServer; tools-only catalogs are unchanged. Factored a shared `_parse_catalog_file` helper. The serve summary reports primitive counts. - gateway_primitives.py: add resource_ids() / prompt_ids() accessors mirroring ProxyRuntime.list_tool_ids(). - Makefile: add `benchmark-primitives` target for the mixed-primitive benchmark. - docs/gateway_spec.md: add §9.4 request flows and §9.5 serve/catalog wiring. - Tests: test_mcp_primitive_upstream.py + serve-CLI wiring tests. https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng

Copilot

Pull request overview

This PR completes end-to-end MCP resource and prompt support through the gateway by shipping concrete PrimitiveUpstream adapters and wiring contextweaver mcp serve --gateway to construct and expose the primitive runtime when a snapshot catalog includes resources/prompts. This makes the resources/prompts shaping surface usable from the CLI (not just library-only).

Changes:

Add mcp_primitive_upstream.py with StubPrimitiveUpstream, McpClientPrimitiveUpstream, and MultiplexPrimitiveUpstream to back the primitive runtime.
Wire _mcp_cli.py to parse snapshot catalogs for resources/prompts, build a shared-context PrimitiveGatewayRuntime, and pass it into McpGatewayServer.
Add tests, docs/spec updates, a benchmark target, and changelog/agent-doc updates for the new primitive pathway.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_mcp_serve_cli.py	Adds CLI/unit tests verifying snapshot parsing and `mcp serve` primitive-wiring + summary output.
tests/test_mcp_primitive_upstream.py	Adds coverage for the three `PrimitiveUpstream` adapters and basic runtime integration.
src/contextweaver/adapters/mcp_primitive_upstream.py	Introduces concrete primitive upstream adapters (stub, client-session wrapper, multiplex fan-out).
src/contextweaver/adapters/gateway_primitives.py	Adds `resource_ids()` / `prompt_ids()` accessors for primitive runtime introspection.
src/contextweaver/_mcp_cli.py	Adds shared catalog parsing and constructs/passes primitive runtime in gateway serve mode; reports primitive counts.
Makefile	Adds `benchmark-primitives` target.
docs/gateway_spec.md	Documents primitive request flows and CLI snapshot-catalog wiring (§9.4–§9.5).
CHANGELOG.md	Documents the new end-to-end gateway primitive support.
AGENTS.md	Updates module map to include `mcp_primitive_upstream.py` and the new accessors.

… dict-shaped results Address Copilot review on #701: - MultiplexPrimitiveUpstream.list_resources/list_prompts now clear their ownership index at the start of each build, so repeated listings (e.g. successive PrimitiveGatewayRuntime.refresh() calls) are idempotent rather than returning an empty union on the second call and erasing the catalog. - McpClientPrimitiveUpstream.list_resources/list_prompts route through a new _unwrap_listing helper that handles pydantic-result, dict-shaped ({"resources": [...]}), and bare-list payloads, so a dict listing no longer iterates string keys into _model_to_dict. - Tests: multiplex idempotent-listing + repeated-refresh, and client dict-shaped listing unwrap. https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng

dgenio · 2026-06-15T07:57:32Z

Thanks — both issues were real. Fixed in 390de96:

Multiplex idempotency (lines 181/192): list_resources()/list_prompts() now clear() their ownership index at the start of each build and repopulate it, so a second call (e.g. a repeated PrimitiveGatewayRuntime.refresh()) returns the full union instead of []. Added test_multiplex_listings_are_idempotent and test_multiplex_repeated_refresh_keeps_catalog.
Dict-shaped listings (lines 135/146): both client list_* methods now go through a new _unwrap_listing helper that handles pydantic-result, dict ({"resources": [...]}), and bare-list payloads, so a dict listing no longer iterates string keys into _model_to_dict. Added test_client_unwraps_dict_shaped_listings.

make fmt/lint/type/module-size-check clean; the new module's tests pass.

Generated by Claude Code

github-actions · 2026-06-15T08:06:23Z

Benchmark delta (vs `main`)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size	recall@k (head Δ vs base)	MRR (head Δ vs base)	p99 (ms)
50	✅ 0.5649 (+0.0000)	✅ 0.4978 (+0.0000)	✅ 0.501 (base 0.759)
83	✅ 0.3825 (+0.0000)	✅ 0.3242 (+0.0000)	✅ 0.755 (base 1.134)
1000	✅ 0.1475 (+0.0000)	✅ 0.1456 (+0.0000)	✅ 44.303 (base 41.711)

Per-backend × per-size matrix

backend	size	recall@k (Δ)	MRR (Δ)	p99 (ms)
bm25	100	✅ 0.3825 (+0.0000)	✅ 0.3399 (+0.0000)	✅ 6.318 (base 8.140)
bm25	500	✅ 0.2250 (+0.0000)	✅ 0.2165 (+0.0000)	✅ 29.414 (base 38.989)
bm25	1000	✅ 0.1575 (+0.0000)	✅ 0.1525 (+0.0000)	✅ 86.938 (base 111.716)
embedding_hashing	100	✅ 0.5175 (+0.0000)	✅ 0.4360 (+0.0000)	✅ 7.678 (base 7.225)
embedding_hashing	500	✅ 0.2700 (+0.0000)	✅ 0.2674 (+0.0000)	✅ 42.549 (base 44.182)
embedding_hashing	1000	✅ 0.2000 (+0.0000)	✅ 0.1931 (+0.0000)	✅ 127.009 (base 98.277)
embedding_st	100	skipped (skipped: missing sentence-transformers)	—	—
embedding_st	500	skipped (skipped: missing sentence-transformers)	—	—
embedding_st	1000	skipped (skipped: missing sentence-transformers)	—	—
fuzzy	100	skipped (skipped: missing rapidfuzz)	—	—
fuzzy	500	skipped (skipped: missing rapidfuzz)	—	—
fuzzy	1000	skipped (skipped: missing rapidfuzz)	—	—
tfidf	100	✅ 0.3825 (+0.0000)	✅ 0.3220 (+0.0000)	✅ 1.053 (base 1.102)
tfidf	500	✅ 0.2325 (+0.0000)	✅ 0.2314 (+0.0000)	✅ 10.023 (base 11.492)
tfidf	1000	✅ 0.1475 (+0.0000)	✅ 0.1456 (+0.0000)	✅ 36.567 (base 50.755)

Context pipeline (per scenario)

scenario	tokens	dropped	dedup
large_catalog	1480 (base 1514, Δ-34)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
long_conversation	2500 (base 2548, Δ-48)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
mixed_payload	488 (base 497, Δ-9)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
short_conversation	487 (base 496, Δ-9)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
stress_conversation	6590 (base 6651, Δ-61)	11 (base 7, Δ+4)	4 (base 4, Δ+0)
tiny_payload	256 (base 267, Δ-11)	0 (base 0, Δ+0)	0 (base 0, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

…are skipped _load_primitive_defs_from_catalog silently filtered out non-dict resources/prompts entries, so a mistyped catalog entry vanished without a trace. Factor the per-kind filtering into _collect_primitive_defs, which now also drops dict entries lacking the required identity field (uri for resources, name for prompts) and logs a warning for every skipped entry. Adds test_load_primitive_defs_skips_malformed_entries covering non-dict and identity-less entries plus the warning count. https://claude.ai/code/session_01J3qykQ9umrpbdy4n5gKq6c

Copilot AI review requested due to automatic review settings June 15, 2026 07:52

Copilot started reviewing on behalf of dgenio June 15, 2026 07:53 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py

Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py

Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py Outdated

Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py Outdated

dgenio merged commit b5f5020 into main Jun 15, 2026
9 checks passed

dgenio deleted the claude/github-issues-triage-ejay6a branch June 15, 2026 19:32

This was referenced Jun 15, 2026

Gateway resources support: list/read modeling parity #669

Closed

Gateway prompts support: list/get parity #670

Closed

Gateway spec updates for resources/prompts coverage #672

Closed

Regression benchmark for tools+resources+prompts context shaping #673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): make resources/prompts reachable end-to-end (#669, #670, #672, #673)#701

feat(gateway): make resources/prompts reachable end-to-end (#669, #670, #672, #673)#701
dgenio merged 3 commits into
mainfrom
claude/github-issues-triage-ejay6a

dgenio commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dgenio commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgenio commented Jun 15, 2026

Summary

Changes

Checklist

Notes for reviewers

Reproducibility

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dgenio commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark delta (vs main)

Routing summary (single backend × catalog sizes)

Per-backend × per-size matrix

Context pipeline (per scenario)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Benchmark delta (vs `main`)