a-tokyo · a-tokyo · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/.agents/plans/2026-06-10-ivf-wave.md b/.agents/plans/2026-06-10-ivf-wave.md
@@ -0,0 +1,27 @@
+# IVF / coarse-quantizer wave + repo finish-up
+
+## Context
+
+Final roadmap item for quantvec: IVF (inverted-file) coarse quantizer for sublinear search on 10M+ corpora. User decisions: **full parity** (TurboQuantIndex + IdMapIndex + Collection), **full remove() support**, release as **v0.0.3**. Architecture: IVF as an opt-in option `ivf: { nlist, nprobe? }` on TurboQuantIndex (like `calibrate`/`fastscan`) — parity through the existing wrapper layering is then nearly free. Format VERSION bumps 1→2 (v2-only readers, sanctioned by repo ADR D-010: pre-1.0 bump-and-rewrite, no legacy readers).
+
+## Steps (in dependency order)
+
+1. **`src/core/kmeans.ts`** (new) — seeded k-means++ init + Lloyd, `kmeans(data, m, {k, dim, rng, spherical, maxIterations=25})` → `{centroids, assignments, iterations}`. Spherical mode (cosine/dot): renormalize centroids each round, zero-mean keeps previous direction. Convergence: zero assignment changes. Empty-cluster repair: re-seed from farthest point (deterministic first-max). `KMeansError` codes `INVALID_K|INVALID_DIM|INVALID_LENGTH`; k ∈ [2, m]. Cite Lloyd 1982 / Arthur & Vassilvitskii 2007. Tests: determinism, separated blobs, repair, spherical unit norms, k=m, validation.
+
+2. **`src/core/search.ts`** — refactor query prep (validation, norms, rotation, calibration dual, LUT) into private `prepareScan`; add exported `searchSlots(db, query, k, slots: Int32Array, opts)` scanning only given slots, honoring full-length mask, same errors + new `SearchError` code `INVALID_SLOT`. Tests: ≡ searchFlat over all slots (exact), subset-only results, mask interaction, empty slots, INVALID_SLOT.
+
+3. **`src/index/coarse.ts`** (new) — `CoarseQuantizer`: `train(vecs, nlist, nprobe, metric, dim, seed)` (sample min(m, 64·nlist) via partial Fisher-Yates with domain-separated rng seed; spherical for cosine/dot), `fromState(centroids, listForSlot, ...)`, `assign`, `addSlot`, `swapRemove(i, last)` (two-step dance: A) swap-pop slot i from its list, B) renumber last→i reading posForSlot AFTER step A; handles i===last and same-list-tail), `clear()` (keeps centroids), `probe(query, nprobe)` → concatenated Int32Array of top-nprobe lists' slots via TopK. State: `#postings: number[][]`, `#listForSlot`, `#posForSlot`. `defaultNprobe(nlist) = max(1, ceil(nlist/8))`. Tests: invariant fuzz (postings[listForSlot[s]][posForSlot[s]] === s after every op), train determinism, probe(nlist) = all slots, fromState round-trip.
+
+4. **`src/index/turboquant-index.ts`** — `TurboQuantIndexOptions.ivf?: IvfOptions {nlist (int, [2, 2^22]), nprobe? (int, [1,nlist])}`; `IndexSearchOptions.nprobe?`; `IndexError` codes `INVALID_NLIST|INVALID_NPROBE`; getter `ivfActive`. `@internal trainIvfFromBatch(vecs)` mirroring fitCalibrationFromBatch (train iff first batch ≥ nlist, else freeze flat forever); hooks: `add()` after calibration call, `#appendOne` → `addSlot(slot, rawVec)`, `swapRemove` → coarse.swapRemove(i, last), `clear()` → coarse.clear(). Search: IVF branch before WASM block (`probe` + `searchSlots`); nprobe ignored when flat (documented); WASM/FastScan bypassed under IVF (documented as future wave). toPayload/fromPayload carry ivf state; fromPayload freezes. Tests: validation, train/freeze semantics, nprobe=nlist ≡ flat (exact scores), recall sanity on 64-cluster mixture, remove parity vs flat twin, round-trip, mask interaction.
+
+5. **`src/io/serialize.ts`** — VERSION = 2 (read accepts only 2). New section after calibration, before ids: flag u8 ∈ {0,1}; if 1: nlist u32 + nprobe u32 + centroids nlist·dim f32 + listForSlot n·u32 (postings rebuilt on load). `DeserializeError` code `BAD_IVF`; bounds-check before allocation; validate nlist ∈ [2, 2^22], nprobe ∈ [1, nlist], centroids finite, listForSlot entries < nlist. `IndexPayload.ivf?`. Tests: round-trips (±ivf, ±calibration, both kinds), crafted v1 → BAD_VERSION, all corrupt-IVF branches, n=0 with ivf.
+
+6. **Plumbing** — IdMapIndex: call `trainIvfFromBatch` in addWithIds beside calibration; `IdMapSearchOptions.nprobe?` forwarded; `ivfActive` getter. Collection: `CollectionConfig.ivf?`, constructor passes through; `SearchParams.nprobe?` forwarded. `src/index.ts`: export `IvfOptions` type. Tests in id-map-index.test.ts + collection.test.ts (passthrough, remove/delete under ivf, filter+ivf, round-trip).
+
+7. **Bench + docs + release** — `benchmarks/ivf.ts` (synthetic 64-cluster mixture, dim 768, n 20k, nlist 128; flat baseline + nprobe ∈ {1,2,4,8,16,128}; recall@1/10/100, QPS, METRIC lines, results JSON) + `bench:ivf` script. Docs: README (scope note, quickstart, roadmap table ✅), docs/roadmap.md (Planned→Shipped), serialization.md (v2 layout), api-reference.md, guide.md, architecture.md (module map), benchmarks.md. Version bump 0.0.3 (publish trigger on main merge). Branch `feat/ivf`, PR to main; do NOT merge without user confirmation (merge auto-publishes to npm).
+
+## Verification
+- nprobe=nlist ≡ searchFlat exactly (IVF analog of the WASM≡scalar oracle).
+- Invariant fuzz over add/remove/clear; determinism (same seed+data ⇒ byte-identical toBytes()).
+- Every BAD_IVF branch hit by a crafted-buffer test; no allocation before bounds check.
+- `bun run typecheck && lint && format:check && test:coverage` (90% global) && build; run bench:ivf, commit results JSON.
diff --git a/README.md b/README.md
@@ -37,8 +37,9 @@ scalar codebook is fully determined by `(dim, bits)` with **no data and ~zero in
 | Dependencies       | **Zero** runtime dependencies                                      |
 
 > **Scope:** quantvec is a _flat quantized index_ — O(n) scan over compact codes (à la FAISS
-> `IndexPQFastScan`). Great recall and throughput up to ~1–10M vectors. An IVF coarse-quantizer
-> for larger corpora is on the roadmap.
+> `IndexPQFastScan`) — with an opt-in **IVF coarse quantizer** (`ivf: { nlist }`) that probes
+> only the nearest cells for sublinear search on large corpora (**11× QPS at equal recall**
+> measured at 20k vectors; the gain grows with n).
 
 A 1M × 1536-d corpus (e.g. OpenAI `text-embedding-ada-002`) is **6.1 GB as float32**. At 4 bits
 quantvec packs it into **~780 MB** (7.92×); at 2 bits, **~390 MB** (15.67×) — with **94%+
@@ -78,6 +79,16 @@ exact rescore of the candidate pool):
 const index = new TurboQuantIndex({ dim: 1536, bits: 4, fastscan: true });
 ```
 
+For large corpora, enable the **IVF coarse quantizer** — k-means cells are trained from the
+first add (needs ≥ `nlist` vectors; ~32·nlist recommended) and queries probe only the nearest
+`nprobe` cells (sublinear scan; ~11× QPS at equal recall on clustered data):
+
+```ts
+const index = new TurboQuantIndex({ dim: 1536, ivf: { nlist: 1024 } });
+index.add(corpus); // first batch trains + freezes the cells
+index.search(query, 10, { nprobe: 32 }); // per-query recall/speed knob
+```
+
 ### Stable ids: `IdMapIndex`
 
 ```ts
@@ -275,18 +286,18 @@ Full results and JSON in [`benchmarks/`](./benchmarks/).
 
 ## Roadmap
 
-| Status | Item                                                                                                     |
-| ------ | -------------------------------------------------------------------------------------------------------- |
-| ✅     | Core math: rotation, Beta/Lloyd-Max codebooks, encode pipeline, flat nibble-LUT search                   |
-| ✅     | `TurboQuantIndex`, `IdMapIndex`, versioned serialization, Node fs helpers                                |
-| ✅     | True 2/3/4-bit **bit-packed serialization** (7.9–15.7× compression)                                      |
-| ✅     | **FWHT rotation** for power-of-two dims (O(d·log d), ~25× faster encode)                                 |
-| ✅     | **TQ+ per-coordinate calibration** (opt-in; data-dependent)                                              |
-| ✅     | **Exact WASM scoring kernel** (AssemblyScript, bit-identical to scalar, ~1.3× query)                     |
-| ✅     | **v128 FastScan kernel** (blocked-nibble swizzle + exact rescore, **~5.7× query**)                       |
-| ✅     | **Ergonomic `createCollection`** with typed payloads and filter DSL                                      |
-| ✅     | Real-dataset benchmarks: SIFT-small + GloVe-200 + dbpedia-OpenAI-100k (results in `benchmarks/results/`) |
-| 📋     | IVF / coarse-quantizer for 10M+ corpora                                                                  |
+| Status | Item                                                                                                      |
+| ------ | --------------------------------------------------------------------------------------------------------- |
+| ✅     | Core math: rotation, Beta/Lloyd-Max codebooks, encode pipeline, flat nibble-LUT search                    |
+| ✅     | `TurboQuantIndex`, `IdMapIndex`, versioned serialization, Node fs helpers                                 |
+| ✅     | True 2/3/4-bit **bit-packed serialization** (7.9–15.7× compression)                                       |
+| ✅     | **FWHT rotation** for power-of-two dims (O(d·log d), ~25× faster encode)                                  |
+| ✅     | **TQ+ per-coordinate calibration** (opt-in; data-dependent)                                               |
+| ✅     | **Exact WASM scoring kernel** (AssemblyScript, bit-identical to scalar, ~1.3× query)                      |
+| ✅     | **v128 FastScan kernel** (blocked-nibble swizzle + exact rescore, **~5.7× query**)                        |
+| ✅     | **Ergonomic `createCollection`** with typed payloads and filter DSL                                       |
+| ✅     | Real-dataset benchmarks: SIFT-small + GloVe-200 + dbpedia-OpenAI-100k (results in `benchmarks/results/`)  |
+| ✅     | **IVF / coarse-quantizer** for 10M+ corpora (k-means cells, full remove parity, ~11× QPS at equal recall) |
 
 ---
 

diff --git a/benchmarks/ivf.ts b/benchmarks/ivf.ts
@@ -0,0 +1,216 @@
+// quantvec — IVF coarse-quantizer benchmark: speedup vs recall against the flat scan.
+//
+// Measures the IVF value proposition on clustered data (the regime IVF exists for):
+// a seeded Gaussian-mixture corpus, an exact float32 cosine ground truth, a flat
+// TurboQuantIndex baseline, and the same index with `ivf` enabled swept across
+// nprobe values. At nprobe = nlist the IVF results are exactly the flat scan's
+// (the searchSlots oracle), so the sweep shows the recall/QPS trade-off cleanly.
+//
+// Self-contained and deterministic. Emits a human table, `METRIC key=value` lines,
+// and a JSON results file. Run: `npm run bench:ivf` (env: DIM, N, NQ, CLUSTERS, NLIST).
+
+import { mkdirSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+import { TurboQuantIndex } from '../src/index/turboquant-index';
+
+// ── Deterministic PRNG + Gaussian (mulberry32 + Box–Muller) ────────────────────
+function mulberry32(seed: number): () => number {
+  let a = seed >>> 0;
+  return () => {
+    a |= 0;
+    a = (a + 0x6d2b79f5) | 0;
+    let t = Math.imul(a ^ (a >>> 15), 1 | a);
+    t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
+    return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
+  };
+}
+
+function gaussian(rng: () => number): number {
+  let u = 0;
+  let v = 0;
+  while (u === 0) u = rng();
+  while (v === 0) v = rng();
+  return Math.sqrt(-2 * Math.log(u)) * Math.cos(2 * Math.PI * v);
+}
+
+/** Gaussian mixture: `clusters` centers (sigma 5), unit-sigma points around them. */
+function makeClustered(
+  count: number,
+  dim: number,
+  clusters: number,
+  rng: () => number,
+): { vectors: Float32Array[]; centers: Float32Array[] } {
+  const centers = Array.from({ length: clusters }, () => {
+    const c = new Float32Array(dim);
+    for (let i = 0; i < dim; i++) c[i] = gaussian(rng) * 5;
+    return c;
+  });
+  const vectors: Float32Array[] = [];
+  for (let j = 0; j < count; j++) {
+    const c = centers[j % clusters]!;
+    const v = new Float32Array(dim);
+    for (let i = 0; i < dim; i++) v[i] = c[i]! + gaussian(rng);
+    vectors.push(v);
+  }
+  return { vectors, centers };
+}
+
+// ── Exact cosine top-k ground truth ────────────────────────────────────────────
+function normalized(v: Float32Array): Float32Array {
+  let s = 0;
+  for (let i = 0; i < v.length; i++) s += v[i]! * v[i]!;
+  const inv = 1 / Math.sqrt(s);
+  const out = new Float32Array(v.length);
+  for (let i = 0; i < v.length; i++) out[i] = v[i]! * inv;
+  return out;
+}
+
+function exactTopK(db: Float32Array[], query: Float32Array, k: number): number[] {
+  const q = normalized(query);
+  const scored = db.map((v, idx) => {
+    const u = normalized(v);
+    let dot = 0;
+    for (let i = 0; i < u.length; i++) dot += u[i]! * q[i]!;
+    return { idx, dot };
+  });
+  scored.sort((a, b) => b.dot - a.dot);
+  return scored.slice(0, k).map((s) => s.idx);
+}
+
+function recallAt(approx: Int32Array, exact: number[], k: number): number {
+  const truth = new Set(exact.slice(0, k));
+  let hit = 0;
+  for (let i = 0; i < Math.min(k, approx.length); i++) if (truth.has(approx[i]!)) hit++;
+  return hit / k;
+}
+
+// ── Benchmark one configuration ────────────────────────────────────────────────
+interface Row {
+  label: string;
+  nprobe?: number;
+  recall1: number;
+  recall10: number;
+  recall100: number;
+  qps: number;
+  speedupVsFlat: number;
+}
+
+function benchIndex(
+  label: string,
+  index: TurboQuantIndex,
+  queries: Float32Array[],
+  exact: number[][],
+  nprobe?: number,
+): Omit<Row, 'speedupVsFlat'> {
+  const K = 100;
+  const opts = nprobe === undefined ? {} : { nprobe };
+  const approxAll: Int32Array[] = [];
+  const tSearch = performance.now();
+  for (const q of queries) approxAll.push(index.search(q, K, opts).indices);
+  const searchSecs = (performance.now() - tSearch) / 1000;
+
+  let r1 = 0;
+  let r10 = 0;
+  let r100 = 0;
+  for (let i = 0; i < queries.length; i++) {
+    r1 += recallAt(approxAll[i]!, exact[i]!, 1);
+    r10 += recallAt(approxAll[i]!, exact[i]!, 10);
+    r100 += recallAt(approxAll[i]!, exact[i]!, 100);
+  }
+  const nq = queries.length;
+  const row: Omit<Row, 'speedupVsFlat'> = {
+    label,
+    recall1: r1 / nq,
+    recall10: r10 / nq,
+    recall100: r100 / nq,
+    qps: nq / searchSecs,
+  };
+  if (nprobe !== undefined) row.nprobe = nprobe;
+  return row;
+}
+
+// ── Run ────────────────────────────────────────────────────────────────────────
+function main(): void {
+  const dim = Number(process.env.DIM ?? 768);
+  const n = Number(process.env.N ?? 20000);
+  const nq = Number(process.env.NQ ?? 200);
+  const clusters = Number(process.env.CLUSTERS ?? 64);
+  const nlist = Number(process.env.NLIST ?? 128);
+  const rng = mulberry32(42);
+
+  process.stdout.write(
+    `quantvec IVF benchmark — dim=${dim} n=${n} queries=${nq} clusters=${clusters} nlist=${nlist} (cosine, 4-bit)\n`,
+  );
+
+  const { vectors: db } = makeClustered(n, dim, clusters, rng);
+  const { vectors: queries } = makeClustered(nq, dim, clusters, mulberry32(7));
+  process.stdout.write('computing exact float32 ground truth…\n');
+  const exact = queries.map((q) => exactTopK(db, q, 100));
+
+  // Flat baseline (scalar path — the IVF scan is scalar too, so the comparison is fair).
+  const flat = new TurboQuantIndex({ dim, bits: 4, metric: 'cosine', seed: 1, wasm: false });
+  flat.add(db);
+  const flatRow: Row = { ...benchIndex('flat', flat, queries, exact), speedupVsFlat: 1 };
+
+  // IVF index: trained from the same single batch.
+  const tTrain = performance.now();
+  const ivf = new TurboQuantIndex({ dim, bits: 4, metric: 'cosine', seed: 1, ivf: { nlist } });
+  ivf.add(db);
+  const trainSecs = (performance.now() - tTrain) / 1000;
+  if (!ivf.ivfActive) throw new Error('IVF did not train — first batch smaller than nlist?');
+
+  const sweep = [1, 2, 4, 8, 16, nlist].filter((p, i, arr) => arr.indexOf(p) === i && p <= nlist);
+  const rows: Row[] = [flatRow];
+  for (const nprobe of sweep) {
+    const r = benchIndex(`ivf@${nprobe}`, ivf, queries, exact, nprobe);
+    rows.push({ ...r, speedupVsFlat: r.qps / flatRow.qps });
+  }
+
+  // Human table.
+  process.stdout.write(`\nbuild+train: ${trainSecs.toFixed(2)}s for ${n} vectors\n`);
+  process.stdout.write('\nconfig    | recall@1 | recall@10 | recall@100 |    QPS | speedup\n');
+  process.stdout.write('----------|----------|-----------|------------|--------|--------\n');
+  for (const r of rows) {
+    process.stdout.write(
+      `${r.label.padEnd(9)} |  ${r.recall1.toFixed(3)}   |   ${r.recall10.toFixed(3)}   |   ${r.recall100.toFixed(
+        3,
+      )}    | ${Math.round(r.qps).toString().padStart(6)} | ${r.speedupVsFlat.toFixed(2)}x\n`,
+    );
+  }
+
+  // METRIC lines (autoresearch protocol).
+  process.stdout.write('\n');
+  process.stdout.write(`METRIC flat_qps=${Math.round(flatRow.qps)}\n`);
+  for (const r of rows) {
+    if (r.nprobe === undefined) continue;
+    process.stdout.write(`METRIC ivf_recall_at10_nprobe${r.nprobe}=${r.recall10.toFixed(4)}\n`);
+    process.stdout.write(`METRIC ivf_qps_nprobe${r.nprobe}=${Math.round(r.qps)}\n`);
+  }
+
+  // JSON results.
+  const outDir = join(process.cwd(), 'benchmarks', 'results');
+  mkdirSync(outDir, { recursive: true });
+  const outPath = join(outDir, `ivf-d${dim}.json`);
+  writeFileSync(
+    outPath,
+    JSON.stringify(
+      {
+        dim,
+        n,
+        nq,
+        clusters,
+        nlist,
+        metric: 'cosine',
+        bits: 4,
+        trainSecs,
+        generatedAt: new Date().toISOString(),
+        rows,
+      },
+      null,
+      2,
+    ),
+  );
+  process.stdout.write(`\nwrote ${outPath}\n`);
+}
+
+main();
diff --git a/benchmarks/results/ivf-d768.json b/benchmarks/results/ivf-d768.json
@@ -0,0 +1,75 @@
+{
+  "dim": 768,
+  "n": 20000,
+  "nq": 200,
+  "clusters": 64,
+  "nlist": 128,
+  "metric": "cosine",
+  "bits": 4,
+  "trainSecs": 17.951041874999994,
+  "generatedAt": "2026-06-10T12:35:16.896Z",
+  "rows": [
+    {
+      "label": "flat",
+      "recall1": 0.355,
+      "recall10": 0.6029999999999996,
+      "recall100": 0.7497500000000001,
+      "qps": 52.78636802603001,
+      "speedupVsFlat": 1
+    },
+    {
+      "label": "ivf@1",
+      "recall1": 0.24,
+      "recall10": 0.3864999999999999,
+      "recall100": 0.4408500000000001,
+      "qps": 1205.0945395762583,
+      "nprobe": 1,
+      "speedupVsFlat": 22.82965440967642
+    },
+    {
+      "label": "ivf@2",
+      "recall1": 0.305,
+      "recall10": 0.5320000000000001,
+      "recall100": 0.6422999999999999,
+      "qps": 1090.9819903257126,
+      "nprobe": 2,
+      "speedupVsFlat": 20.667873754597544
+    },
+    {
+      "label": "ivf@4",
+      "recall1": 0.355,
+      "recall10": 0.6019999999999996,
+      "recall100": 0.7372500000000002,
+      "qps": 851.6927813487874,
+      "nprobe": 4,
+      "speedupVsFlat": 16.134710782314116
+    },
+    {
+      "label": "ivf@8",
+      "recall1": 0.36,
+      "recall10": 0.6029999999999996,
+      "recall100": 0.7495499999999999,
+      "qps": 600.3408441145751,
+      "nprobe": 8,
+      "speedupVsFlat": 11.373028048047084
+    },
+    {
+      "label": "ivf@16",
+      "recall1": 0.355,
+      "recall10": 0.6029999999999996,
+      "recall100": 0.7498500000000001,
+      "qps": 377.1228478449929,
+      "nprobe": 16,
+      "speedupVsFlat": 7.144322709587182
+    },
+    {
+      "label": "ivf@128",
+      "recall1": 0.355,
+      "recall10": 0.6029999999999996,
+      "recall100": 0.7497500000000001,
+      "qps": 59.81218376773218,
+      "nprobe": 128,
+      "speedupVsFlat": 1.1330990557682923
+    }
+  ]
+}