diff --git a/README.md b/README.md
index 0cd763a..62410ef 100644
--- a/README.md
+++ b/README.md
@@ -4,102 +4,28 @@
 ![Coverage](https://codecov.io/gh/JinBa1/java-query-engine/branch/main/graph/badge.svg)
 ![Dependencies](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen)
 
-An in-memory relational query engine built on the Volcano/iterator model. Parses SQL via JSqlParser, builds an operator tree, and executes queries tuple-by-tuple against CSV data.
+**A self-hosted gateway that gives AI agents safe, read-only, budgeted SQL access to your CSV files — no database required.**
 
-## Architecture
+Everyone has CSVs — exports, dumps, logs — and AI agents increasingly need to query them. Embedding a database in every agent environment hands over raw file access; what you actually want is a *guarded window* onto the data: an endpoint that is read-only by construction, resource-budgeted, and auditable. cuckooDB is that gateway, built on a from-scratch query engine and exposed over both a **REST API** and the **Model Context Protocol (MCP)**, so an agent can discover tables, preview data, check a query's cost, and run SQL — without writing SQL blind or bypassing the guardrails.
 
-```
-SQL → JSqlParser → QueryPlanner → QueryPlanOptimizer → Operator Tree → Results
-```
-
-**Core components:**
-
-| Component | Role |
-|-----------|------|
-| `QueryPlanner` | Parses SQL and builds the operator pipeline |
-| `QueryPlanOptimizer` | Selection pushdown, trivial operator removal |
-| `DBCatalog` | Schema and table metadata (singleton) |
-| `Value` | Typed tuple values (sealed interface: `IntValue`, `StringValue`) |
-| `ExpressionEvaluator` | Evaluates WHERE/HAVING conditions per tuple |
-| `ExpressionPreprocessor` | Resolves column references to indices |
-| `ConditionSplitter` | Separates join predicates from selection predicates |
-
-**Operator hierarchy** (all extend `Operator`):
-
-`ScanOperator` → `SelectOperator` → `ProjectOperator` → `JoinOperator` / `HashJoinOperator` → `SortOperator` → `AggregateOperator` → `DuplicateEliminationOperator` → `LimitOperator`
-
-## Feature Matrix
-
-| Feature | Status |
-|---------|--------|
-| `SELECT *` / projection | ✅ Supported |
-| `WHERE` predicates | ✅ Supported |
-| Inner joins (nested-loop) | ✅ Supported |
-| Hash join (auto-selected for equi-joins) | ✅ Supported |
-| `ORDER BY` | ✅ Supported |
-| `GROUP BY` + `SUM`, `COUNT`, `AVG`, `MIN`, `MAX` | ✅ Supported |
-| `LIMIT n` | ✅ Supported |
-| `DISTINCT` | ✅ Supported |
-| Nested arithmetic/comparison expressions | ✅ Supported |
-| Query optimisation (selection pushdown) | ✅ Supported |
-| Typed columns (int, string) | ✅ Supported |
-| CSV header support | ✅ Supported |
-| Query budgets (`--max-tuples`, `--timeout-ms`) | ✅ Supported |
-| `EXPLAIN` plan inspection | ✅ Supported |
-| Indexes | ❌ Not supported |
-| Transactions | ❌ Not supported |
-| INSERT / UPDATE / DELETE | ❌ Not supported |
-| Concurrency | ❌ Not supported |
-| Persistence | ❌ Not supported |
-| Full SQL dialect | ❌ Not supported |
-
-## Scope
-
-This engine supports **SQL-over-CSV query execution**: read-only queries against tables stored as CSV files. It does not support transactions, indexes, data modification (INSERT/UPDATE/DELETE), concurrency, persistence, or a full SQL dialect. Values are typed int or string, inferred per column from the data. Tables are discovered from CSV files with header rows; no separate schema file.
-
-Supported SQL features include `SELECT`/`FROM`/`WHERE`, `GROUP BY` with `SUM`, `COUNT`, `AVG`, `MIN`, and `MAX` aggregates, `ORDER BY`, `DISTINCT`, inner joins, and `LIMIT n`.
-
-The focus is on demonstrating query planning, optimisation, and the Volcano iterator execution model.
-
-### Aggregate and LIMIT semantics
-
-| Case | Behavior |
-|---|---|
-| `AVG` of ints | truncated integer division (toward zero) |
-| Aggregate over empty input, no `GROUP BY` | zero rows (header only) — deviates from SQL's NULL row |
-| `COUNT(col)` | equals `COUNT(*)` — the engine has no NULLs |
-| `SUM`/`AVG` on a string column | error |
-| `MIN`/`MAX` on strings | lexicographic |
-| `SUM` past int range | error |
-| `LIMIT 0` | header-only output |
-| `OFFSET`, `LIMIT ALL` | not supported (error) |
-
-## Quick Start
-
-**Prerequisites:** Java 17, Maven (or use the included Maven Wrapper).
-
-```bash
-# Clone
-git clone https://github.com/JinBa1/java-query-engine.git
-cd java-query-engine
+## Features
 
-# Build fat JAR (engine module)
-./mvnw -pl engine -DskipTests clean package
-```
-
-**Run a query:**
-
-```bash
-java -cp engine/target/cuckoodb-engine-1.0.0-jar-with-dependencies.jar \
-  com.github.jinba1.cuckoodb.CuckooDB \
-  database_dir input_file output_file [--max-tuples=N] [--timeout-ms=N]
-```
-
-Both `--max-tuples` and `--timeout-ms` are optional and independent. Omit either to impose no limit on that dimension.
+| Capability | |
+|---|:--:|
+| Read-only SQL over CSV — `SELECT` / `WHERE` / `JOIN` / `GROUP BY` / `ORDER BY` / `LIMIT` / `DISTINCT` | ✅ |
+| Aggregates — `COUNT` / `SUM` / `AVG` / `MIN` / `MAX` | ✅ |
+| Hash + nested-loop joins (planner auto-selects) | ✅ |
+| Typed columns (int / string), CSV headers | ✅ |
+| `EXPLAIN` plan inspection | ✅ |
+| Tuple + time budgets, fail-closed | ✅ |
+| **REST API** + OpenAPI / Swagger | ✅ |
+| **MCP server** — five agent tools, Streamable-HTTP | ✅ |
+| Runs as a container (published to GHCR) | ✅ |
+| Writes / transactions / indexes / persistence | ❌ read-only by design |
 
-### Run the server as a container
+## Quick start
 
-The Spring Boot gateway (REST + MCP) ships as a container image, so you can run it next to your data with no Java toolchain. Put your CSV files in a folder and mount it as the catalog's data directory:
+Run the gateway next to your data — no Java toolchain needed. Put your CSVs in a folder and mount it:
 
 ```bash
 docker run --rm -p 8080:8080 \
@@ -107,198 +33,94 @@ docker run --rm -p 8080:8080 \
   ghcr.io/jinba1/cuckoodb:latest
 ```
 
-- **REST:** `POST http://localhost:8080/queries`, `GET /tables`, `GET /tables/{name}` (OpenAPI at `/swagger-ui.html`).
-- **MCP:** Streamable-HTTP endpoint at `http://localhost:8080/mcp` — point an MCP client at it to query your CSVs with `list_tables` / `describe_table` / `sample_rows` / `explain_query` / `query`.
-
-The image is published to GHCR on each merge to `main`. To build it locally instead: `docker build -t cuckoodb .`
-
-### Query budgets
-
-The engine enforces **total-work semantics**: every tuple emitted by any operator in the tree counts against the budget, including intermediate tuples that are later filtered or joined. A cross-product explosion that never produces output rows will still hit the tuple limit. The timeout clock starts lazily at the first tuple emission.
-
-When a budget is exceeded:
-- The partial output file is deleted.
-- `Error: <message>` is written to stderr.
-- The process exits with code 1.
-
-Both flags are optional and independent — you can use one, both, or neither.
-
-### EXPLAIN
-
-Prefix any query with `EXPLAIN` to inspect the query plan without executing it:
-
-```sql
-EXPLAIN SELECT Student.B, SUM(Student.C) FROM Student, Enrolled
-WHERE Student.D > 30 AND Student.A = Enrolled.A
-GROUP BY Student.B;
-```
-
-The output file receives a two-section plan:
+Query over REST:
 
-```
-=== Plan (as written) ===
-Aggregate[group by: Student.B; calls: SUM(Student.c)]
-  Project[Enrolled.A, Student.A, Student.B, Student.C, Student.D]
-    Select[Student.D > 30]
-      Join[Student.A = Enrolled.A]
-        Scan[Student]
-        Scan[Enrolled]
+```bash
+curl -s localhost:8080/tables
+# ["People"]
 
-=== Plan (optimized) ===
-Aggregate[group by: Student.B; calls: SUM(Student.c)]
-  Project[Enrolled.A, Student.A, Student.B, Student.C, Student.D]
-    Join[Student.A = Enrolled.A]
-      Select[Student.D > 30]
-        Scan[Student]
-      Project[Enrolled.A]
-        Scan[Enrolled]
+curl -s localhost:8080/queries -H 'Content-Type: application/json' \
+  -d '{"sql":"SELECT * FROM People LIMIT 5"}'
+# {"columns":[{"name":"id","type":"INT"},...],"rows":[[1,"alice"],...],"rowCount":5,"truncated":true,"hint":"..."}
 ```
 
-No query execution occurs for EXPLAIN queries.
-
-## Join algorithms
-
-The engine supports two join algorithms; the planner selects between them automatically.
+…or connect an AI agent over MCP (below). To use the engine directly from the command line instead, see the **[engine README](engine/README.md)**.
 
-### Nested-loop join
+## For agents: MCP
 
-`JoinOperator` implements a classic nested-loop join: for every outer tuple the inner child is rewound and scanned in full. It handles any join condition (equality, inequality, arbitrary expression, or cross product with no condition). `EXPLAIN` shows it as `Join[<condition>]`.
+The server exposes a Model Context Protocol endpoint at `http://localhost:8080/mcp` (Streamable-HTTP). Point an MCP client (e.g. Claude Desktop) at it and the agent gets five tools:
 
-### Hash join
-
-`HashJoinOperator` extends `JoinOperator` with an in-memory hash join. The inner (build) side is drained once into a `HashMap` keyed by the equality conjuncts; the outer (probe) side then streams through once. After a hash-table lookup, the full original condition is re-evaluated on every candidate, so residual non-equality conjuncts (e.g. `A.x = B.x AND A.y > 3`) work correctly. Output order — outer-major, inner order preserved within each key bucket — is identical to the nested-loop join. `EXPLAIN` shows it as `HashJoin[<condition>]`.
-
-**Auto-selection rule:** the planner chooses hash join when `Constants.useHashJoin` is `true` (the default) **and** the join condition contains at least one column-to-column equality conjunct (e.g. `Student.A = Enrolled.A`). Cross products (no condition) and pure non-equi joins (e.g. `A.x > B.y` only) always use nested-loop join.
-
-**Toggle:** set `Constants.useHashJoin = false` at program start (or in tests) to force nested-loop for all joins.
-
-### Benchmarks
-
-Performance was measured with a JMH 1.37 benchmark suite in the `bench/` package (`engine/src/test/java/com/github/jinba1/cuckoodb/bench/`). The suite is compiled in CI but never run there; run it locally with:
-
-```bash
-./mvnw -pl engine -q test-compile exec:exec -Dexec.executable=java -Dexec.classpathScope=test \
-  "-Dexec.args=-cp %classpath org.openjdk.jmh.Main .*Benchmark"
-```
-
-**Results** (OpenJDK 21.0.5, Intel Core i9-13900HX, 32 logical cores, Linux under WSL2):
-
-| Benchmark | matchesPerKey | rowsPerSide | useHashJoin | Mode | Cnt | Score | Error | Units |
-|-----------|--------------|-------------|-------------|------|-----|-------|-------|-------|
-| EndToEndJoinBenchmark.planAndDrain | N/A | N/A | true | avgt | 3 | 1.028 | ± 0.288 | ms/op |
-| EndToEndJoinBenchmark.planAndDrain | N/A | N/A | false | avgt | 3 | 315.523 | ± 36.021 | ms/op |
-| JoinAlgorithmBenchmark.hashJoin | 1 | 1000 | N/A | avgt | 5 | 0.270 | ± 0.011 | ms/op |
-| JoinAlgorithmBenchmark.hashJoin | 1 | 5000 | N/A | avgt | 5 | 1.382 | ± 0.154 | ms/op |
-| JoinAlgorithmBenchmark.hashJoin | 10 | 1000 | N/A | avgt | 5 | 2.160 | ± 0.109 | ms/op |
-| JoinAlgorithmBenchmark.hashJoin | 10 | 5000 | N/A | avgt | 5 | 10.661 | ± 0.840 | ms/op |
-| JoinAlgorithmBenchmark.nestedLoopJoin | 1 | 1000 | N/A | avgt | 5 | 202.621 | ± 25.313 | ms/op |
-| JoinAlgorithmBenchmark.nestedLoopJoin | 1 | 5000 | N/A | avgt | 5 | 5027.912 | ± 370.564 | ms/op |
-| JoinAlgorithmBenchmark.nestedLoopJoin | 10 | 1000 | N/A | avgt | 5 | 194.786 | ± 4.916 | ms/op |
-| JoinAlgorithmBenchmark.nestedLoopJoin | 10 | 5000 | N/A | avgt | 5 | 4785.620 | ± 212.683 | ms/op |
+| Tool | What it does |
+|---|---|
+| `list_tables` | list the available tables |
+| `describe_table` | a table's column names and types |
+| `sample_rows` | preview rows without writing SQL |
+| `explain_query` | preview a query's plan and cost before running it |
+| `query` | run a read-only `SELECT`, budget-bounded |
 
-`EndToEndJoinBenchmark` joins two 1 000-row CSV tables through the full planner pipeline; nested-loop re-parses the inner CSV once per outer row, so the gap (≈ 307×) reflects both the algorithmic difference and I/O cost. `JoinAlgorithmBenchmark` uses in-memory `CachedOperator` inputs to isolate the join algorithm itself; at 5 000 rows/side the operator-level gap is ≈ 3 600×.
+Every tool routes through the same guarded execution path as the REST API, so agent traffic inherits the read-only guarantee, the tuple/time budget, and concurrency limits (with audit hooks in place) — there is no way to bypass them.
 
-Benchmarks are compiled in CI but never executed there.
+## REST API
 
-## Demo
+| Endpoint | |
+|---|---|
+| `POST /queries` | plan + execute one read-only query → JSON columns/rows, or an `EXPLAIN` plan |
+| `GET /tables` | list table names |
+| `GET /tables/{name}` | a table's typed schema |
+| `/swagger-ui.html` | interactive OpenAPI docs |
 
-**Input table** (`engine/samples/db/data/Student.csv`):
+Queries are **budget-bounded and fail-closed**: the server always attaches a budget, so an unbounded query is unreachable. A result that would exceed the tuple budget returns `429` (retry with a tighter `LIMIT`); one that exceeds the time budget returns `504`.
 
-```
-A, B, C, D
-1, 200, 50, 33
-2, 200, 200, 44
-3, 100, 105, 44
-4, 100, 50, 11
-5, 100, 500, 22
-6, 300, 400, 11
-```
+### EXPLAIN
 
-**Query** (`engine/samples/input/query4.sql`):
+Any query can be planned without executing it — prefix `EXPLAIN` over REST, or call the `explain_query` tool. The plan is shown as written and after optimisation:
 
-```sql
-SELECT * FROM Student WHERE Student.A < 3;
 ```
+=== Plan (as written) ===
+Project[Student.B, Student.C]
+  Select[Student.D > 30]
+    HashJoin[Student.A = Enrolled.A]
+      Scan[Student]
+      Scan[Enrolled]
 
-**Command:**
-
-```bash
-java -cp engine/target/cuckoodb-engine-1.0.0-jar-with-dependencies.jar \
-  com.github.jinba1.cuckoodb.CuckooDB \
-  engine/samples/db engine/samples/input/query4.sql output.csv
+=== Plan (optimized) ===
+Project[Student.B, Student.C]
+  HashJoin[Student.A = Enrolled.A]
+    Select[Student.D > 30]
+      Scan[Student]
+    Project[Enrolled.A]
+      Scan[Enrolled]
 ```
 
-To limit resource usage, add optional budget flags:
+The optimiser pushes the `Select` below the join (selection pushdown) and projects the inner scan down to just the key it needs; the planner picked a hash join for the equi-condition. See the [engine README](engine/README.md#explain) for the full treatment.
 
-```bash
-java -cp engine/target/cuckoodb-engine-1.0.0-jar-with-dependencies.jar \
-  com.github.jinba1.cuckoodb.CuckooDB \
-  engine/samples/db engine/samples/input/query4.sql output.csv --max-tuples=10000 --timeout-ms=5000
-```
+## How it works
 
-**Output** (`output.csv`):
-
-```
-a,b,c,d
-1,200,50,33
-2,200,200,44
 ```
-
-## Running Examples
-
-The `engine/samples/` directory ships with 20 queries and a small dataset (Student, Course, Enrolled, Staff tables). Expected output lives in `engine/samples/expected_output/`.
-
-Run all 20 through the bundled runner, which diffs each result against the expected output and reports pass/fail. It is launched via `exec:exec` (not `exec:java`) so it runs with the engine module as the working directory — `exec:java` would keep the working directory at the reactor root and fail to find `samples/`:
-
-```bash
-./mvnw -pl engine -q test-compile exec:exec -Dexec.executable=java -Dexec.classpathScope=test \
-  "-Dexec.args=-cp %classpath com.github.jinba1.cuckoodb.SampleQueryRunner"
+SQL → JSqlParser → QueryPlanner → optimizer → operator tree → results
 ```
 
-Or run each query through the CLI and diff manually:
-
-```bash
-# Run all sample queries and diff against expected output
-for i in $(seq 1 20); do
-  java -cp engine/target/cuckoodb-engine-1.0.0-jar-with-dependencies.jar \
-    com.github.jinba1.cuckoodb.CuckooDB \
-    engine/samples/db "engine/samples/input/query${i}.sql" "/tmp/out${i}.csv"
-  diff "engine/samples/expected_output/query${i}.csv" "/tmp/out${i}.csv" && echo "query${i}: OK"
-done
-```
+The engine is a from-scratch Volcano/iterator executor — typed values, hash and nested-loop joins, selection pushdown, tuple/time budgets. The server wraps it behind a single `QueryService` choke point that applies the budget, a concurrency permit, and audit; **both** the REST controllers and the MCP tools go through it, so the guardrails can't be bypassed and apply uniformly. Engine internals — architecture, join algorithms, benchmarks, CLI — are in the **[engine README](engine/README.md)**.
 
-## Testing
+## Build and test
 
 ```bash
-./mvnw test
+./mvnw clean verify    # builds + tests both modules: engine (419 tests) + server (90 tests)
 ```
 
-The test suite covers individual operators, the query planner, the optimiser, expression evaluation, query budgets, EXPLAIN, hash join, and end-to-end integration scenarios (339 tests).
+The 20 sample queries are a golden-output regression gate (see the engine README to run them). CI builds, tests, and publishes the container image to GHCR on every merge to `main`.
 
-## Project Structure
+## Project structure
 
 ```
-├── pom.xml                                          # Parent POM (aggregator: engine + server; Java 17, dep/plugin management)
-├── engine/                                          # Pure query engine — zero Spring dependencies
-│   ├── pom.xml                                      # cuckoodb-engine (JSqlParser 4.7, commons-csv 1.14.1, JMH 1.37 test-scope)
-│   ├── src/main/java/com/github/jinba1/cuckoodb/    # Core engine (35 files)
-│   │   └── operator/                                # Volcano operators (11 files, incl. HashJoinOperator)
-│   ├── src/test/java/com/github/jinba1/cuckoodb/    # JUnit 5 tests (339 tests across 33 files)
-│   └── samples/
-│       ├── db/data/                                 # CSV data files (header row + data rows)
-│       ├── input/query[1-20].sql                    # Sample queries
-│       └── expected_output/query[1-20].csv          # Expected results
-├── server/                                          # cuckoodb-server — Spring Boot REST + MCP gateway over the engine
-│   ├── pom.xml                                      # Spring Boot 4 (web MVC), springdoc/OpenAPI, Spring AI MCP server
-│   └── src/main/java/com/github/jinba1/cuckoodb/server/   # web/ controllers, query/ service, catalog/ facade, mcp/ agent tools, config
-├── mvnw / mvnw.cmd                                  # Maven Wrapper
-└── LICENSE
+├── engine/   # pure query engine — Java 17, zero Spring (see engine/README.md)
+└── server/   # Spring Boot 4 gateway — REST + MCP over the engine
 ```
 
 ## Background
 
-Originally built as a university project for the Advanced Database Systems course at the University of Edinburgh, subsequently extended with additional query optimisation and expanded test coverage.
+Originally built as a university project for the Advanced Database Systems course at the University of Edinburgh, then extended into a guarded, agent-facing gateway — REST and MCP interfaces, query budgets, and additional optimisation and test coverage.
 
 ## License
 
-This project is released under the MIT License. See [LICENSE](LICENSE) for details.
+Released under the MIT License. See [LICENSE](LICENSE).
diff --git a/engine/README.md b/engine/README.md
new file mode 100644
index 0000000..9e93b01
--- /dev/null
+++ b/engine/README.md
@@ -0,0 +1,166 @@
+# cuckooDB — query engine
+
+The query engine under the [cuckooDB gateway](../README.md): an in-memory relational query engine on the Volcano/iterator model. It parses SQL via JSqlParser, builds an operator tree, optimises it, and executes tuple-by-tuple against CSV files. Pure Java 17, **zero Spring dependencies**.
+
+## Architecture
+
+```
+SQL → JSqlParser → QueryPlanner → QueryPlanOptimizer → Operator Tree → Results
+```
+
+| Component | Role |
+|-----------|------|
+| `QueryPlanner` | Parses SQL and builds the operator pipeline |
+| `QueryPlanOptimizer` | Selection pushdown, trivial operator removal |
+| `DBCatalog` | Schema and table metadata (singleton) |
+| `Value` | Typed tuple values (sealed interface: `IntValue`, `StringValue`) |
+| `ExpressionEvaluator` | Evaluates WHERE/HAVING conditions per tuple |
+| `ExpressionPreprocessor` | Resolves column references to indices |
+| `ConditionSplitter` | Separates join predicates from selection predicates |
+
+**Operator hierarchy** (all extend `Operator`):
+
+`ScanOperator` → `SelectOperator` → `ProjectOperator` → `JoinOperator` / `HashJoinOperator` → `SortOperator` → `AggregateOperator` → `DuplicateEliminationOperator` → `LimitOperator`
+
+## Scope
+
+Read-only SQL-over-CSV: `SELECT`/`FROM`/`WHERE`, inner joins, `GROUP BY` with `SUM`/`COUNT`/`AVG`/`MIN`/`MAX`, `ORDER BY`, `DISTINCT`, `LIMIT n`, and nested arithmetic/comparison expressions. Values are typed int or string, inferred per column from the data. Tables are discovered from CSV files with header rows; no separate schema file. No transactions, indexes, data modification, persistence, or full SQL dialect — the focus is query planning, optimisation, and the Volcano execution model.
+
+## Build and run (CLI)
+
+Run from the repository root (uses the Maven Wrapper):
+
+```bash
+./mvnw -pl engine -DskipTests clean package
+
+java -cp engine/target/cuckoodb-engine-1.0.0-jar-with-dependencies.jar \
+  com.github.jinba1.cuckoodb.CuckooDB \
+  <database_dir> <input_file> <output_file> [--max-tuples=N] [--timeout-ms=N]
+```
+
+`<database_dir>` is a directory containing a `data/` subdir of `.csv` tables. `--max-tuples` and `--timeout-ms` are optional and independent — use one, both, or neither.
+
+### Demo
+
+**Input** (`engine/samples/db/data/Student.csv`):
+
+```
+A, B, C, D
+1, 200, 50, 33
+2, 200, 200, 44
+3, 100, 105, 44
+4, 100, 50, 11
+5, 100, 500, 22
+6, 300, 400, 11
+```
+
+**Command** (`engine/samples/input/query4.sql` is `SELECT * FROM Student WHERE Student.A < 3;`):
+
+```bash
+java -cp engine/target/cuckoodb-engine-1.0.0-jar-with-dependencies.jar \
+  com.github.jinba1.cuckoodb.CuckooDB \
+  engine/samples/db engine/samples/input/query4.sql output.csv
+```
+
+**Output** (`output.csv`):
+
+```
+a,b,c,d
+1,200,50,33
+2,200,200,44
+```
+
+## Query budgets
+
+The engine enforces **total-work semantics**: every tuple emitted by any operator counts against the budget, including intermediate tuples later filtered or joined. A cross-product explosion that never produces output rows still hits the tuple limit. The timeout clock starts lazily at the first tuple emission. On a breach the partial output file is deleted, `Error: <message>` is written to stderr, and the process exits 1.
+
+## EXPLAIN
+
+Prefix any query with `EXPLAIN` to inspect the plan without executing it. The output has two sections — as written, then after optimisation:
+
+```
+=== Plan (as written) ===
+Aggregate[group by: Student.B; calls: SUM(Student.c)]
+  Project[Enrolled.A, Student.A, Student.B, Student.C, Student.D]
+    Select[Student.D > 30]
+      HashJoin[Student.A = Enrolled.A]
+        Scan[Student]
+        Scan[Enrolled]
+
+=== Plan (optimized) ===
+Aggregate[group by: Student.B; calls: SUM(Student.c)]
+  Project[Enrolled.A, Student.A, Student.B, Student.C, Student.D]
+    HashJoin[Student.A = Enrolled.A]
+      Select[Student.D > 30]
+        Scan[Student]
+      Project[Enrolled.A]
+        Scan[Enrolled]
+```
+
+The optimiser pushes the `Select` below the join (selection pushdown) and inserts a projection on the inner scan; the planner picked a hash join for the equi-condition. No execution occurs for `EXPLAIN`.
+
+## Join algorithms
+
+The planner selects between two join algorithms automatically.
+
+**Nested-loop** (`JoinOperator`): for every outer tuple the inner child is rewound and scanned in full. Handles any condition — equality, inequality, arbitrary expression, or cross product. Shown in `EXPLAIN` as `Join[<condition>]`.
+
+**Hash** (`HashJoinOperator extends JoinOperator`): the inner (build) side is drained once into a `HashMap` keyed by the equality conjuncts; the outer (probe) side streams through once. After a lookup, the full original condition is re-evaluated on every candidate, so residual non-equality conjuncts (e.g. `A.x = B.x AND A.y > 3`) work. Output order is identical to nested-loop (outer-major, inner order preserved per key bucket). Shown as `HashJoin[<condition>]`.
+
+**Auto-selection:** hash join is used when `Constants.useHashJoin` is `true` (default) **and** the condition has at least one column-to-column equality conjunct. Cross products and pure non-equi joins always use nested-loop. Set `Constants.useHashJoin = false` to force nested-loop everywhere.
+
+### Benchmarks
+
+A JMH 1.37 suite lives in `engine/src/test/java/com/github/jinba1/cuckoodb/bench/`. It is compiled in CI but never run there; run it locally from the repository root:
+
+```bash
+./mvnw -pl engine -q test-compile exec:exec -Dexec.executable=java -Dexec.classpathScope=test \
+  "-Dexec.args=-cp %classpath org.openjdk.jmh.Main .*Benchmark"
+```
+
+**Results** (OpenJDK 21.0.5, Intel Core i9-13900HX, 32 logical cores, Linux under WSL2):
+
+| Benchmark | matchesPerKey | rowsPerSide | useHashJoin | Mode | Cnt | Score | Error | Units |
+|-----------|--------------|-------------|-------------|------|-----|-------|-------|-------|
+| EndToEndJoinBenchmark.planAndDrain | N/A | N/A | true | avgt | 3 | 1.028 | ± 0.288 | ms/op |
+| EndToEndJoinBenchmark.planAndDrain | N/A | N/A | false | avgt | 3 | 315.523 | ± 36.021 | ms/op |
+| JoinAlgorithmBenchmark.hashJoin | 1 | 1000 | N/A | avgt | 5 | 0.270 | ± 0.011 | ms/op |
+| JoinAlgorithmBenchmark.hashJoin | 1 | 5000 | N/A | avgt | 5 | 1.382 | ± 0.154 | ms/op |
+| JoinAlgorithmBenchmark.hashJoin | 10 | 1000 | N/A | avgt | 5 | 2.160 | ± 0.109 | ms/op |
+| JoinAlgorithmBenchmark.hashJoin | 10 | 5000 | N/A | avgt | 5 | 10.661 | ± 0.840 | ms/op |
+| JoinAlgorithmBenchmark.nestedLoopJoin | 1 | 1000 | N/A | avgt | 5 | 202.621 | ± 25.313 | ms/op |
+| JoinAlgorithmBenchmark.nestedLoopJoin | 1 | 5000 | N/A | avgt | 5 | 5027.912 | ± 370.564 | ms/op |
+| JoinAlgorithmBenchmark.nestedLoopJoin | 10 | 1000 | N/A | avgt | 5 | 194.786 | ± 4.916 | ms/op |
+| JoinAlgorithmBenchmark.nestedLoopJoin | 10 | 5000 | N/A | avgt | 5 | 4785.620 | ± 212.683 | ms/op |
+
+`EndToEndJoinBenchmark` joins two 1 000-row CSV tables through the full pipeline; nested-loop re-parses the inner CSV once per outer row, so the ≈ 307× gap reflects both algorithm and I/O. `JoinAlgorithmBenchmark` uses in-memory inputs to isolate the algorithm; at 5 000 rows/side the operator-level gap is ≈ 3 600×.
+
+## Sample queries
+
+`engine/samples/` ships 20 queries and a small dataset (Student, Course, Enrolled, Staff). The bundled runner diffs each result against `engine/samples/expected_output/` — the golden-output regression gate. Launch via `exec:exec` (not `exec:java`) so it runs with the engine module as the working directory:
+
+```bash
+./mvnw -pl engine -q test-compile exec:exec -Dexec.executable=java -Dexec.classpathScope=test \
+  "-Dexec.args=-cp %classpath com.github.jinba1.cuckoodb.SampleQueryRunner"
+```
+
+## Testing
+
+```bash
+./mvnw -pl engine test
+```
+
+419 tests across operators, the planner, the optimiser, expression evaluation, query budgets, EXPLAIN, hash join, and end-to-end integration scenarios.
+
+## Layout
+
+```
+engine/
+├── src/main/java/com/github/jinba1/cuckoodb/   # core engine (45 files)
+│   └── operator/                                # Volcano operators (11 files, incl. HashJoinOperator)
+├── src/test/java/com/github/jinba1/cuckoodb/    # JUnit 5 tests (419 across 41 files)
+└── samples/
+    ├── db/data/                                 # CSV tables (header row + data rows)
+    ├── input/query[1-20].sql
+    └── expected_output/query[1-20].csv
+```