Skip to content

feat(server): expose the query engine to agents over MCP#21

Merged
JinBa1 merged 1 commit into
mainfrom
feat/mcp-server
Jun 20, 2026
Merged

feat(server): expose the query engine to agents over MCP#21
JinBa1 merged 1 commit into
mainfrom
feat/mcp-server

Conversation

@JinBa1

@JinBa1 JinBa1 commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Layers a Model Context Protocol tool surface on the Spring Boot gateway using Spring AI 2.0.0 (WebMvc Streamable-HTTP, SYNC), completing the "REST + Spring Boot + MCP for agents" surface. The engine module is untouched and Java stays 17; the entire change lives in server/.

Tools (all read-only)

Tool Purpose
list_tables discover queryable tables
describe_table catalog-authoritative typed schema
sample_rows preview rows without writing SQL
explain_query plan + cost without executing
query run a read-only SELECT, typed results

Design

  • Choke-point routing. Every query tool calls QueryService.execute(...,"mcp") and every catalog tool calls CatalogFacade; no tool touches the engine directly. MCP traffic therefore inherits budget clamping, the concurrency permit, and the audit trail for free — and per-key governance, when it lands inside QueryService, is inherited with no tool-signature change.
  • Error model mirrors the REST errorCode taxonomy onto CallToolResult.isError(true): engine errors are agent-legible and verbatim; DATA_ERROR/INTERNAL are scrubbed to a generic message + correlation id (full detail logged server-side only), exactly as the REST 5xx path.
  • DRY extraction. The table-name charset guard and the catalog-column→DTO mapping move into the catalog package; TableController delegates to them, so REST and MCP share one validation and one schema shape (no drift).
  • Wiring. spring-ai-bom:2.0.0 + spring-ai-starter-mcp-server-webmvc (non-reparent BOM; transport pulled transitively). The bare @Component is auto-registered by the annotation scanner — no ToolCallbackProvider bean. SYNC Streamable-HTTP at /mcp (SSE is deprecated/broken on Boot 4 + Tomcat 11).

Verification

./mvnw clean verify: engine 419 + server 90 passing, 0 failures; the 20 sample queries are byte-identical. New tests span three tiers — plain JUnit, Mockito, and a @SpringBootTest that both drives the tools directly (choke-point routing, budget inheritance proven by a tight per-call budget) and round-trips a real MCP client over the live /mcp endpoint on Tomcat 11.

Notes / deferrals

  • The server (REST and MCP) is unauthenticated — intended for local/trusted use; per-key auth returns with Phase-4 governance. MCP tools pass the constant "mcp" principal, which the audit trail already records.
  • A table-upload tool is deferred (needs a write-path service extraction).
  • MCP tool-argument schema generation relies on the compiler -parameters flag, so a clean build is required (CI runs clean).

Layer a Model Context Protocol tool surface on the existing Spring Boot
gateway using Spring AI 2.0.0 (WebMvc Streamable-HTTP, SYNC), so an agent can
discover tables, inspect schemas, preview data and query cost, and run
read-only SQL without writing SQL blind — and without bypassing the server's
budget, concurrency, and audit guarantees. Engine module untouched; Java 17.

- add CuckooMcpTools (@component, five @mcptool methods: list_tables,
  describe_table, sample_rows, explain_query, query). Every query tool routes
  through QueryService.execute(...,"mcp") and every catalog tool through
  CatalogFacade, so MCP traffic inherits budget clamping, the concurrency
  permit, and the audit trail with no MCP-specific governance code
- mirror the REST errorCode taxonomy onto CallToolResult.isError(true): engine
  errors are agent-legible and verbatim, while DATA_ERROR/INTERNAL are scrubbed
  to a generic message + correlation id (full detail logged server-side only)
- extract the table-name charset guard (TableNameValidator) and the
  catalog-column -> DTO mapping (CatalogMapper) into the catalog package and
  delegate TableController to them, so the REST and MCP surfaces share one
  validation and one schema shape
- import spring-ai-bom 2.0.0 + spring-ai-starter-mcp-server-webmvc (no reparent;
  the starter pulls the transport transitively) and configure the SYNC
  Streamable-HTTP server at /mcp (SSE is deprecated/broken on Boot 4 + Tomcat 11)
- the bare @component is auto-registered by the annotation scanner; no
  ToolCallbackProvider bean (the legacy @tool path) is declared

Verify with ./mvnw clean verify: engine 419 and server 90 passing tests, 0
failures; the 20 sample queries are byte-identical. The new tests cover the
five tools at three tiers — plain JUnit, Mockito, and a @SpringBootTest that
both drives the tools directly (choke-point routing, budget inheritance) and
round-trips a real MCP client over the live /mcp endpoint on Tomcat 11.

Known follow-up: tools pass the constant "mcp" principal (per-key MCP auth
returns with Phase-4 governance), and a table-upload tool is deferred (it needs
a write-path extraction). MCP schema generation relies on -parameters, so a
clean build is required for correct tool argument names.
@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@JinBa1

JinBa1 commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

@claude review this PR

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Claude finished @JinBa1's task in 2m 26s —— View job


PR Review: feat(server): expose the query engine to agents over MCP

Tasks:

  • Gather context and understand request
  • Read CLAUDE.md and repo guidelines
  • Review changed files
  • Post comprehensive review feedback

Summary

This is a well-designed, well-tested MCP layer over the Spring Boot gateway. The architecture decisions are sound, the security posture is careful, and the test coverage is thorough (unit, mock, integration, live wire). A few observations below — none are blockers, but two are worth considering before this goes into production.


Architecture

Choke-point routing (CuckooMcpTools.java:67-161) is correctly implemented: every query tool calls QueryService.execute(..., "mcp") and every catalog tool calls CatalogFacade. The class-level Javadoc makes this a named invariant, which is good. Annotating a deviation as a defect in the comment is an unusually rigorous contract — keep it.

DRY extraction of CatalogMapper and TableNameValidator into the catalog package is clean. TableController.java now delegates instead of reimplementing, so drift between REST and MCP schema shapes is structurally impossible.

Error model (CuckooMcpTools.java:188-213) mirrors the REST taxonomy correctly. The scrubbed() path (lines 210-213) logs the correlation ID server-side and returns only errorId=<uuid> over the wire — this is exactly right for DATA_ERROR/INTERNAL.


Security

SQL injection via sampleRows (CuckooMcpTools.java:113-115):

TableNameValidator.validate(name);  // blocks [^A-Za-z0-9_] and length > 64
int n = clampLimit(limit);          // int, so safe
String sql = "SELECT * FROM " + name + " LIMIT " + n;

The guard is structurally sound. n is a clamped int (not user-controlled), and name after validation can only contain [A-Za-z0-9_] — no SQL metacharacters, no path separators. The comment at line 112 explains the design correctly.

No read-only enforcement at the tool layer for query and explainQuery: Both tools accept arbitrary SQL strings and rely entirely on the engine to reject non-SELECT statements. If an agent passes DROP TABLE foo or INSERT INTO ..., the engine returns UNSUPPORTED_SQL or PARSE_ERROR — correct, but only because the engine itself guards it. A defense-in-depth check at the tool layer would be low-cost:

// in query() and explainQuery(), before calling queryService:
String upper = sql.strip().toUpperCase();
if (!upper.startsWith("SELECT") && !upper.startsWith("WITH") && !upper.startsWith("(")) {
    return error("BAD_REQUEST: Only SELECT statements are permitted.");
}

This isn't required given the engine's behavior, but it would make the safety guarantee explicit and agent-legible without an engine roundtrip. Fix this →


Correctness

json() serialization failures (CuckooMcpTools.java:171-175):

private CallToolResult json(Object payload) {
    return CallToolResult.builder()
            .addTextContent(objectMapper.writeValueAsString(payload))
            .build();
}

If writeValueAsString throws (Jackson 3 makes this unchecked), the exception bubbles out of json() and is caught by the catch (Exception e) in each tool method, which routes it to scrubbed(ErrorCode.INTERNAL, e). This works, but the chain is non-obvious: a serialization failure looks identical to an engine 5xx. Since these DTOs are straightforward, this is unlikely to matter in practice.

explainQuery SQL construction (CuckooMcpTools.java:135): "EXPLAIN " + sql is correct — sql is the full query string passed to the parser, not interpolated into a larger SQL template, so there's no injection surface.

clampLimit and SAMPLE_MIN = 1 (CuckooMcpTools.java:163-168): limit = 0 or negative → LIMIT 1. This is intentional and tested. The choice of 1 as the floor is reasonable.


Tests

Coverage is comprehensive across three tiers:

CuckooMcpToolsTest (345 lines): Covers all 5 tools, all ErrorCode variants, injection rejection, exact SQL construction, and scrubbing of filesystem paths. Notably, dataErrorIsScrubbedOfFilesystemPaths (line 310) asserts that DATA_ERROR with a path in the message does not leak the path — this is the right test to have.

CuckooMcpIntegrationTest (180 lines): Covers choke-point routing with a real QueryService, budget inheritance (maxTuples=1 on a 3-row table trips BUDGET_EXCEEDED), and a full live /mcp wire round-trip with the MCP SDK client. The @DirtiesContext isolation strategy is correct — using resetDBCatalog() across a shared context would be fragile.

CatalogMapperTest / TableNameValidatorTest: Properly unit-tested with boundary cases and injection payloads.

One missing test: sampleRows with a negative limit (e.g., -1) — the unit test at line 188 covers 0, but not negative values. clampLimit would return Math.max(1, Math.min(1000, -1)) = 1, which is correct, but it's not explicitly exercised.


Configuration

application.properties is well-documented. The explicit pins (spring.ai.mcp.server.protocol=STREAMABLE, spring.ai.mcp.server.type=SYNC) and the rationale for not using SSE ("deprecated in 2.0 and had a 0-bytes bug on Boot 4 / Tomcat 11") are the right kind of docs to have next to config that could silently regress.

The BOM import strategy in pom.xml (non-reparent, spring-ai-bom imported alongside spring-boot-dependencies) avoids the parent-POM constraint correctly.


Nitpicks

  • CuckooMcpTools.java:190: be.kind() == BudgetKind.TIME ? "BUDGET_EXCEEDED (timeout)" : "BUDGET_EXCEEDED" — the TIME branch puts the subtype in parentheses while the tuple branch doesn't. Consistent either way, just noting the asymmetry in the taxonomy surface.
  • TableNameValidator.java:15: The pattern is compiled once at class-init as a static field. This is efficient and correct.

Verdict

Approve with one suggestion: The missing read-only guard at the tool layer for query/explainQuery is worth addressing — it's a low-cost, high-clarity change that makes the safety guarantee explicit rather than implicit. Everything else is solid.

@JinBa1

JinBa1 commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the review. On the one suggestion — the tool-layer read-only guard for query/explainQuery — I'm intentionally not adding it, for four reasons:

  1. No threat to defend. The engine is read-only by construction: QueryPlanner plans only Select and rejects everything else as UNSUPPORTED_SQL ("Only SELECT queries are supported; got …"), and there are no write/DML operators at all. DROP TABLE/INSERT can't have side effects — they just return a classified error.

  2. It would break tested behavior. query has a defensive EXPLAIN passthrough (test queryReturnsPlanWhenCallerPrefixedExplain): query("EXPLAIN SELECT …") returns the plan. A startsWith("SELECT"/"WITH"/"(") guard rejects the EXPLAIN prefix and breaks that path.

  3. It would diverge MCP from REST. QueryController does no such pre-check; it relies on the engine through the same QueryService. Guarding MCP-only splits behavior for the identical choke point, against the REST/MCP-parity design. If a gateway-level assertion were ever wanted, its home is QueryService (covers both surfaces) and it should be parse-based — a governance-phase concern, out of scope here.

  4. It's strictly weaker than what exists. A string-prefix filter mishandles comments, CTEs, and leading tokens, and the engine's parse-based rejection already gives a richer, agent-legible message.

Net: it adds a fragile, surface-inconsistent check that breaks a tested behavior to guard a non-threat. Leaving the engine's parse-based guarantee as the belt. The clampLimit(-1) test gap is noted (the path is already correct and covered for 0); happy to add it as a follow-up if useful.

@JinBa1 JinBa1 merged commit 54d46d5 into main Jun 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants