feat(gateway): make resources/prompts reachable end-to-end (#669, #670, #672, #673)#701
Conversation
#672, #673) PR #700 landed the resource/prompt gateway runtime, meta-tools, converters, and benchmark, but the feature was library-only: there was no shipped PrimitiveUpstream adapter and `contextweaver mcp serve` never wired a primitive runtime. This closes those gaps. - adapters/mcp_primitive_upstream.py: ship StubPrimitiveUpstream, McpClientPrimitiveUpstream, and MultiplexPrimitiveUpstream, mirroring the tool mcp_upstream trio. Per the PrimitiveUpstream contract these raise transport errors so the runtime classifies them via classify_upstream_exception. - _mcp_cli.py: `mcp serve --gateway` now builds a PrimitiveGatewayRuntime from a snapshot catalog's optional `resources` / `prompts` lists (sharing the tool runtime's ContextManager) and passes it to McpGatewayServer; tools-only catalogs are unchanged. Factored a shared `_parse_catalog_file` helper. The serve summary reports primitive counts. - gateway_primitives.py: add resource_ids() / prompt_ids() accessors mirroring ProxyRuntime.list_tool_ids(). - Makefile: add `benchmark-primitives` target for the mixed-primitive benchmark. - docs/gateway_spec.md: add §9.4 request flows and §9.5 serve/catalog wiring. - Tests: test_mcp_primitive_upstream.py + serve-CLI wiring tests. https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng
There was a problem hiding this comment.
Pull request overview
This PR completes end-to-end MCP resource and prompt support through the gateway by shipping concrete PrimitiveUpstream adapters and wiring contextweaver mcp serve --gateway to construct and expose the primitive runtime when a snapshot catalog includes resources/prompts. This makes the resources/prompts shaping surface usable from the CLI (not just library-only).
Changes:
- Add
mcp_primitive_upstream.pywithStubPrimitiveUpstream,McpClientPrimitiveUpstream, andMultiplexPrimitiveUpstreamto back the primitive runtime. - Wire
_mcp_cli.pyto parse snapshot catalogs forresources/prompts, build a shared-contextPrimitiveGatewayRuntime, and pass it intoMcpGatewayServer. - Add tests, docs/spec updates, a benchmark target, and changelog/agent-doc updates for the new primitive pathway.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_mcp_serve_cli.py | Adds CLI/unit tests verifying snapshot parsing and mcp serve primitive-wiring + summary output. |
| tests/test_mcp_primitive_upstream.py | Adds coverage for the three PrimitiveUpstream adapters and basic runtime integration. |
| src/contextweaver/adapters/mcp_primitive_upstream.py | Introduces concrete primitive upstream adapters (stub, client-session wrapper, multiplex fan-out). |
| src/contextweaver/adapters/gateway_primitives.py | Adds resource_ids() / prompt_ids() accessors for primitive runtime introspection. |
| src/contextweaver/_mcp_cli.py | Adds shared catalog parsing and constructs/passes primitive runtime in gateway serve mode; reports primitive counts. |
| Makefile | Adds benchmark-primitives target. |
| docs/gateway_spec.md | Documents primitive request flows and CLI snapshot-catalog wiring (§9.4–§9.5). |
| CHANGELOG.md | Documents the new end-to-end gateway primitive support. |
| AGENTS.md | Updates module map to include mcp_primitive_upstream.py and the new accessors. |
… dict-shaped results Address Copilot review on #701: - MultiplexPrimitiveUpstream.list_resources/list_prompts now clear their ownership index at the start of each build, so repeated listings (e.g. successive PrimitiveGatewayRuntime.refresh() calls) are idempotent rather than returning an empty union on the second call and erasing the catalog. - McpClientPrimitiveUpstream.list_resources/list_prompts route through a new _unwrap_listing helper that handles pydantic-result, dict-shaped ({"resources": [...]}), and bare-list payloads, so a dict listing no longer iterates string keys into _model_to_dict. - Tests: multiplex idempotent-listing + repeated-refresh, and client dict-shaped listing unwrap. https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng
|
Thanks — both issues were real. Fixed in 390de96:
Generated by Claude Code |
Benchmark delta (vs
|
| size | recall@k (head Δ vs base) | MRR (head Δ vs base) | p99 (ms) |
|---|---|---|---|
| 50 | ✅ 0.5649 (+0.0000) | ✅ 0.4978 (+0.0000) | ✅ 0.501 (base 0.759) |
| 83 | ✅ 0.3825 (+0.0000) | ✅ 0.3242 (+0.0000) | ✅ 0.755 (base 1.134) |
| 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 44.303 (base 41.711) |
Per-backend × per-size matrix
| backend | size | recall@k (Δ) | MRR (Δ) | p99 (ms) |
|---|---|---|---|---|
| bm25 | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3399 (+0.0000) | ✅ 6.318 (base 8.140) |
| bm25 | 500 | ✅ 0.2250 (+0.0000) | ✅ 0.2165 (+0.0000) | ✅ 29.414 (base 38.989) |
| bm25 | 1000 | ✅ 0.1575 (+0.0000) | ✅ 0.1525 (+0.0000) | ✅ 86.938 (base 111.716) |
| embedding_hashing | 100 | ✅ 0.5175 (+0.0000) | ✅ 0.4360 (+0.0000) | ✅ 7.678 (base 7.225) |
| embedding_hashing | 500 | ✅ 0.2700 (+0.0000) | ✅ 0.2674 (+0.0000) | ✅ 42.549 (base 44.182) |
| embedding_hashing | 1000 | ✅ 0.2000 (+0.0000) | ✅ 0.1931 (+0.0000) | ✅ 127.009 (base 98.277) |
| embedding_st | 100 | skipped (skipped: missing sentence-transformers) | — | — |
| embedding_st | 500 | skipped (skipped: missing sentence-transformers) | — | — |
| embedding_st | 1000 | skipped (skipped: missing sentence-transformers) | — | — |
| fuzzy | 100 | skipped (skipped: missing rapidfuzz) | — | — |
| fuzzy | 500 | skipped (skipped: missing rapidfuzz) | — | — |
| fuzzy | 1000 | skipped (skipped: missing rapidfuzz) | — | — |
| tfidf | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3220 (+0.0000) | ✅ 1.053 (base 1.102) |
| tfidf | 500 | ✅ 0.2325 (+0.0000) | ✅ 0.2314 (+0.0000) | ✅ 10.023 (base 11.492) |
| tfidf | 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 36.567 (base 50.755) |
Context pipeline (per scenario)
| scenario | tokens | dropped | dedup |
|---|---|---|---|
| large_catalog | 1480 (base 1514, Δ-34) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| long_conversation | 2500 (base 2548, Δ-48) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| mixed_payload | 488 (base 497, Δ-9) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| short_conversation | 487 (base 496, Δ-9) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| stress_conversation | 6590 (base 6651, Δ-61) | 11 (base 7, Δ+4) | 4 (base 4, Δ+0) |
| tiny_payload | 256 (base 267, Δ-11) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.
…are skipped _load_primitive_defs_from_catalog silently filtered out non-dict resources/prompts entries, so a mistyped catalog entry vanished without a trace. Factor the per-kind filtering into _collect_primitive_defs, which now also drops dict entries lacking the required identity field (uri for resources, name for prompts) and logs a warning for every skipped entry. Adds test_load_primitive_defs_skips_malformed_entries covering non-dict and identity-less entries plus the warning count. https://claude.ai/code/session_01J3qykQ9umrpbdy4n5gKq6c
Summary
PR #700 landed the resource/prompt gateway runtime (
PrimitiveGatewayRuntime), the four meta-tools, the converters, and the mixed-primitive benchmark — but the feature was library-only: there was no shippedPrimitiveUpstreamadapter, andcontextweaver mcp servenever constructed a primitive runtime, so resources/prompts were unreachable from the CLI. This PR closes the remaining gaps so the #555 "shape all three MCP primitives" capability is usable end-to-end.Advances #555. Addresses #669, #670, #672, #673.
Changes
adapters/mcp_primitive_upstream.py(new) — ships the concretePrimitiveUpstreamtrio mirroring the toolmcp_upstreamadapters:StubPrimitiveUpstream(in-process; tests/CLI/air-gapped CI),McpClientPrimitiveUpstream(wraps a connected MCPClientSession),MultiplexPrimitiveUpstream(multi-server fan-out). Per thePrimitiveUpstreamcontract these raise transport errors so the runtime classifies them viaclassify_upstream_exception(unlike the tool path, which returnsisError)._mcp_cli.py—mcp serve --gatewaynow builds aPrimitiveGatewayRuntimefrom a snapshot catalog's optionalresources/promptslists (sharing the tool runtime'sContextManager) and passes it toMcpGatewayServer(primitive_runtime=…). Tools-only catalogs are unchanged (primitivesoff); proxy mode stays a transparent passthrough. Factored a shared_parse_catalog_filehelper; the serve summary reports primitive counts.adapters/gateway_primitives.py— addedresource_ids()/prompt_ids()accessors mirroringProxyRuntime.list_tool_ids().Makefile— newbenchmark-primitivestarget forbenchmarks/primitive_gateway_benchmark.py(Regression benchmark for tools+resources+prompts context shaping #673).docs/gateway_spec.md— added §9.4 (request flows for the four verbs, firewall/error-taxonomy semantics) and §9.5 (serve + snapshot-catalog wiring) (Gateway spec updates for resources/prompts coverage #672).test_mcp_primitive_upstream.py(all three adapters, incl. timeout-propagation and multiplex routing/collision) + serve-CLI wiring tests intest_mcp_serve_cli.py.Checklist
make cipasses locally — fmt, lint, type, drift-check, module-size-check, doc-snippets-check, readme-version-check, example, demo all green; full test suite 2669 passed / 31 skipped / 1 xfailed. (See note below on the one failure.)CHANGELOG.mdupdated under## [Unreleased]make api-checkclean — the new primitive surface is reached by module path (matching feat(routing): unified cross-primitive identity & collision policy (#671) #700's treatment), soapi/public_api.txtis unchanged.make module-size-checkOK; the new upstream adapters live in their own module rather than growingmcp_upstream.py).AGENTS.mdmodule map gains the new module + theresource_ids()/prompt_ids()note.Notes for reviewers
test_mcp_serve_cli.py::test_serve_dry_run_writes_catalog_diagnostic_eventasserts empty stderr but the sandbox has no network, sotiktokenlogs acl100k_base403 fallback warning. Verified identical failure on a clean tree (git stash); it passes with network / pre-cached tiktoken. This is the same known-environmental case documented in PR feat(routing): surface routing diagnostics and validation (#519, #521, #523, #524, #538) #567.toolsin a snapshot object ({"tools": […], "resources": […], "prompts": […]}) — documented in §9.5. A bare-list (tools-only) catalog is unaffected.McpClientPrimitiveUpstreamis a thin pass-through over aClientSession; it's exercised here with a fake session. Live-session integration is covered by the existing record/replay direction, not added here.Reproducibility
make benchmark-primitives→ mixed-primitive gateway benchmark: overall savings 84.1% (240/1508 tokens); resource ×60 = 86.5% savings, recall@k=1.0; prompt ×40 = 80.1% savings, recall@k=1.0. No routing/scoring/tokenisation core was changed.https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng
Generated by Claude Code