Skip to content

feat(gateway): make resources/prompts reachable end-to-end (#669, #670, #672, #673)#701

Merged
dgenio merged 3 commits into
mainfrom
claude/github-issues-triage-ejay6a
Jun 15, 2026
Merged

feat(gateway): make resources/prompts reachable end-to-end (#669, #670, #672, #673)#701
dgenio merged 3 commits into
mainfrom
claude/github-issues-triage-ejay6a

Conversation

@dgenio

@dgenio dgenio commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

PR #700 landed the resource/prompt gateway runtime (PrimitiveGatewayRuntime), the four meta-tools, the converters, and the mixed-primitive benchmark — but the feature was library-only: there was no shipped PrimitiveUpstream adapter, and contextweaver mcp serve never constructed a primitive runtime, so resources/prompts were unreachable from the CLI. This PR closes the remaining gaps so the #555 "shape all three MCP primitives" capability is usable end-to-end.

Advances #555. Addresses #669, #670, #672, #673.

Changes

  • adapters/mcp_primitive_upstream.py (new) — ships the concrete PrimitiveUpstream trio mirroring the tool mcp_upstream adapters: StubPrimitiveUpstream (in-process; tests/CLI/air-gapped CI), McpClientPrimitiveUpstream (wraps a connected MCP ClientSession), MultiplexPrimitiveUpstream (multi-server fan-out). Per the PrimitiveUpstream contract these raise transport errors so the runtime classifies them via classify_upstream_exception (unlike the tool path, which returns isError).
  • _mcp_cli.pymcp serve --gateway now builds a PrimitiveGatewayRuntime from a snapshot catalog's optional resources / prompts lists (sharing the tool runtime's ContextManager) and passes it to McpGatewayServer(primitive_runtime=…). Tools-only catalogs are unchanged (primitives off); proxy mode stays a transparent passthrough. Factored a shared _parse_catalog_file helper; the serve summary reports primitive counts.
  • adapters/gateway_primitives.py — added resource_ids() / prompt_ids() accessors mirroring ProxyRuntime.list_tool_ids().
  • Makefile — new benchmark-primitives target for benchmarks/primitive_gateway_benchmark.py (Regression benchmark for tools+resources+prompts context shaping #673).
  • docs/gateway_spec.md — added §9.4 (request flows for the four verbs, firewall/error-taxonomy semantics) and §9.5 (serve + snapshot-catalog wiring) (Gateway spec updates for resources/prompts coverage #672).
  • Teststest_mcp_primitive_upstream.py (all three adapters, incl. timeout-propagation and multiplex routing/collision) + serve-CLI wiring tests in test_mcp_serve_cli.py.

Checklist

  • Tests added or updated for every new/changed public function
  • make ci passes locally — fmt, lint, type, drift-check, module-size-check, doc-snippets-check, readme-version-check, example, demo all green; full test suite 2669 passed / 31 skipped / 1 xfailed. (See note below on the one failure.)
  • CHANGELOG.md updated under ## [Unreleased]
  • Docstrings added for all new public APIs (Google-style)
  • Public-API change? make api-check clean — the new primitive surface is reached by module path (matching feat(routing): unified cross-primitive identity & collision policy (#671) #700's treatment), so api/public_api.txt is unchanged.
  • Every modified module stays ≤ 300 lines (make module-size-check OK; the new upstream adapters live in their own module rather than growing mcp_upstream.py).
  • Related issues linked in the summary above
  • Agent-facing docs updated — AGENTS.md module map gains the new module + the resource_ids()/prompt_ids() note.

Notes for reviewers

  • Pre-existing test failure (not from this PR): test_mcp_serve_cli.py::test_serve_dry_run_writes_catalog_diagnostic_event asserts empty stderr but the sandbox has no network, so tiktoken logs a cl100k_base 403 fallback warning. Verified identical failure on a clean tree (git stash); it passes with network / pre-cached tiktoken. This is the same known-environmental case documented in PR feat(routing): surface routing diagnostics and validation (#519, #521, #523, #524, #538) #567.
  • Catalog format: resources/prompts are optional siblings of tools in a snapshot object ({"tools": […], "resources": […], "prompts": […]}) — documented in §9.5. A bare-list (tools-only) catalog is unaffected.
  • McpClientPrimitiveUpstream is a thin pass-through over a ClientSession; it's exercised here with a fake session. Live-session integration is covered by the existing record/replay direction, not added here.

Reproducibility

make benchmark-primitives → mixed-primitive gateway benchmark: overall savings 84.1% (240/1508 tokens); resource ×60 = 86.5% savings, recall@k=1.0; prompt ×40 = 80.1% savings, recall@k=1.0. No routing/scoring/tokenisation core was changed.

https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng


Generated by Claude Code

#672, #673)

PR #700 landed the resource/prompt gateway runtime, meta-tools, converters, and
benchmark, but the feature was library-only: there was no shipped PrimitiveUpstream
adapter and `contextweaver mcp serve` never wired a primitive runtime. This closes
those gaps.

- adapters/mcp_primitive_upstream.py: ship StubPrimitiveUpstream,
  McpClientPrimitiveUpstream, and MultiplexPrimitiveUpstream, mirroring the tool
  mcp_upstream trio. Per the PrimitiveUpstream contract these raise transport
  errors so the runtime classifies them via classify_upstream_exception.
- _mcp_cli.py: `mcp serve --gateway` now builds a PrimitiveGatewayRuntime from a
  snapshot catalog's optional `resources` / `prompts` lists (sharing the tool
  runtime's ContextManager) and passes it to McpGatewayServer; tools-only catalogs
  are unchanged. Factored a shared `_parse_catalog_file` helper. The serve summary
  reports primitive counts.
- gateway_primitives.py: add resource_ids() / prompt_ids() accessors mirroring
  ProxyRuntime.list_tool_ids().
- Makefile: add `benchmark-primitives` target for the mixed-primitive benchmark.
- docs/gateway_spec.md: add §9.4 request flows and §9.5 serve/catalog wiring.
- Tests: test_mcp_primitive_upstream.py + serve-CLI wiring tests.

https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng
Copilot AI review requested due to automatic review settings June 15, 2026 07:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes end-to-end MCP resource and prompt support through the gateway by shipping concrete PrimitiveUpstream adapters and wiring contextweaver mcp serve --gateway to construct and expose the primitive runtime when a snapshot catalog includes resources/prompts. This makes the resources/prompts shaping surface usable from the CLI (not just library-only).

Changes:

  • Add mcp_primitive_upstream.py with StubPrimitiveUpstream, McpClientPrimitiveUpstream, and MultiplexPrimitiveUpstream to back the primitive runtime.
  • Wire _mcp_cli.py to parse snapshot catalogs for resources/prompts, build a shared-context PrimitiveGatewayRuntime, and pass it into McpGatewayServer.
  • Add tests, docs/spec updates, a benchmark target, and changelog/agent-doc updates for the new primitive pathway.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_mcp_serve_cli.py Adds CLI/unit tests verifying snapshot parsing and mcp serve primitive-wiring + summary output.
tests/test_mcp_primitive_upstream.py Adds coverage for the three PrimitiveUpstream adapters and basic runtime integration.
src/contextweaver/adapters/mcp_primitive_upstream.py Introduces concrete primitive upstream adapters (stub, client-session wrapper, multiplex fan-out).
src/contextweaver/adapters/gateway_primitives.py Adds resource_ids() / prompt_ids() accessors for primitive runtime introspection.
src/contextweaver/_mcp_cli.py Adds shared catalog parsing and constructs/passes primitive runtime in gateway serve mode; reports primitive counts.
Makefile Adds benchmark-primitives target.
docs/gateway_spec.md Documents primitive request flows and CLI snapshot-catalog wiring (§9.4–§9.5).
CHANGELOG.md Documents the new end-to-end gateway primitive support.
AGENTS.md Updates module map to include mcp_primitive_upstream.py and the new accessors.

Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py
Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py
Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py Outdated
Comment thread src/contextweaver/adapters/mcp_primitive_upstream.py Outdated
… dict-shaped results

Address Copilot review on #701:

- MultiplexPrimitiveUpstream.list_resources/list_prompts now clear their
  ownership index at the start of each build, so repeated listings (e.g.
  successive PrimitiveGatewayRuntime.refresh() calls) are idempotent rather
  than returning an empty union on the second call and erasing the catalog.
- McpClientPrimitiveUpstream.list_resources/list_prompts route through a new
  _unwrap_listing helper that handles pydantic-result, dict-shaped
  ({"resources": [...]}), and bare-list payloads, so a dict listing no longer
  iterates string keys into _model_to_dict.
- Tests: multiplex idempotent-listing + repeated-refresh, and client
  dict-shaped listing unwrap.

https://claude.ai/code/session_01FSXGXXiPxXckah5iFwa7ng

dgenio commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

Thanks — both issues were real. Fixed in 390de96:

  • Multiplex idempotency (lines 181/192): list_resources()/list_prompts() now clear() their ownership index at the start of each build and repopulate it, so a second call (e.g. a repeated PrimitiveGatewayRuntime.refresh()) returns the full union instead of []. Added test_multiplex_listings_are_idempotent and test_multiplex_repeated_refresh_keeps_catalog.
  • Dict-shaped listings (lines 135/146): both client list_* methods now go through a new _unwrap_listing helper that handles pydantic-result, dict ({"resources": [...]}), and bare-list payloads, so a dict listing no longer iterates string keys into _model_to_dict. Added test_client_unwraps_dict_shaped_listings.

make fmt/lint/type/module-size-check clean; the new module's tests pass.


Generated by Claude Code

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Benchmark delta (vs main)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size recall@k (head Δ vs base) MRR (head Δ vs base) p99 (ms)
50 ✅ 0.5649 (+0.0000) ✅ 0.4978 (+0.0000) ✅ 0.501 (base 0.759)
83 ✅ 0.3825 (+0.0000) ✅ 0.3242 (+0.0000) ✅ 0.755 (base 1.134)
1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 44.303 (base 41.711)

Per-backend × per-size matrix

backend size recall@k (Δ) MRR (Δ) p99 (ms)
bm25 100 ✅ 0.3825 (+0.0000) ✅ 0.3399 (+0.0000) ✅ 6.318 (base 8.140)
bm25 500 ✅ 0.2250 (+0.0000) ✅ 0.2165 (+0.0000) ✅ 29.414 (base 38.989)
bm25 1000 ✅ 0.1575 (+0.0000) ✅ 0.1525 (+0.0000) ✅ 86.938 (base 111.716)
embedding_hashing 100 ✅ 0.5175 (+0.0000) ✅ 0.4360 (+0.0000) ✅ 7.678 (base 7.225)
embedding_hashing 500 ✅ 0.2700 (+0.0000) ✅ 0.2674 (+0.0000) ✅ 42.549 (base 44.182)
embedding_hashing 1000 ✅ 0.2000 (+0.0000) ✅ 0.1931 (+0.0000) ✅ 127.009 (base 98.277)
embedding_st 100 skipped (skipped: missing sentence-transformers)
embedding_st 500 skipped (skipped: missing sentence-transformers)
embedding_st 1000 skipped (skipped: missing sentence-transformers)
fuzzy 100 skipped (skipped: missing rapidfuzz)
fuzzy 500 skipped (skipped: missing rapidfuzz)
fuzzy 1000 skipped (skipped: missing rapidfuzz)
tfidf 100 ✅ 0.3825 (+0.0000) ✅ 0.3220 (+0.0000) ✅ 1.053 (base 1.102)
tfidf 500 ✅ 0.2325 (+0.0000) ✅ 0.2314 (+0.0000) ✅ 10.023 (base 11.492)
tfidf 1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 36.567 (base 50.755)

Context pipeline (per scenario)

scenario tokens dropped dedup
large_catalog 1480 (base 1514, Δ-34) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
long_conversation 2500 (base 2548, Δ-48) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
mixed_payload 488 (base 497, Δ-9) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
short_conversation 487 (base 496, Δ-9) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
stress_conversation 6590 (base 6651, Δ-61) 11 (base 7, Δ+4) 4 (base 4, Δ+0)
tiny_payload 256 (base 267, Δ-11) 0 (base 0, Δ+0) 0 (base 0, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

…are skipped

_load_primitive_defs_from_catalog silently filtered out non-dict
resources/prompts entries, so a mistyped catalog entry vanished without a
trace. Factor the per-kind filtering into _collect_primitive_defs, which now
also drops dict entries lacking the required identity field (uri for
resources, name for prompts) and logs a warning for every skipped entry.

Adds test_load_primitive_defs_skips_malformed_entries covering non-dict and
identity-less entries plus the warning count.

https://claude.ai/code/session_01J3qykQ9umrpbdy4n5gKq6c
@dgenio dgenio merged commit b5f5020 into main Jun 15, 2026
9 checks passed
@dgenio dgenio deleted the claude/github-issues-triage-ejay6a branch June 15, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants