Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .agents/plans/2026-06-10-ivf-wave.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# IVF / coarse-quantizer wave + repo finish-up

## Context

Final roadmap item for quantvec: IVF (inverted-file) coarse quantizer for sublinear search on 10M+ corpora. User decisions: **full parity** (TurboQuantIndex + IdMapIndex + Collection), **full remove() support**, release as **v0.0.3**. Architecture: IVF as an opt-in option `ivf: { nlist, nprobe? }` on TurboQuantIndex (like `calibrate`/`fastscan`) — parity through the existing wrapper layering is then nearly free. Format VERSION bumps 1→2 (v2-only readers, sanctioned by repo ADR D-010: pre-1.0 bump-and-rewrite, no legacy readers).

## Steps (in dependency order)

1. **`src/core/kmeans.ts`** (new) — seeded k-means++ init + Lloyd, `kmeans(data, m, {k, dim, rng, spherical, maxIterations=25})` → `{centroids, assignments, iterations}`. Spherical mode (cosine/dot): renormalize centroids each round, zero-mean keeps previous direction. Convergence: zero assignment changes. Empty-cluster repair: re-seed from farthest point (deterministic first-max). `KMeansError` codes `INVALID_K|INVALID_DIM|INVALID_LENGTH`; k ∈ [2, m]. Cite Lloyd 1982 / Arthur & Vassilvitskii 2007. Tests: determinism, separated blobs, repair, spherical unit norms, k=m, validation.

2. **`src/core/search.ts`** — refactor query prep (validation, norms, rotation, calibration dual, LUT) into private `prepareScan`; add exported `searchSlots(db, query, k, slots: Int32Array, opts)` scanning only given slots, honoring full-length mask, same errors + new `SearchError` code `INVALID_SLOT`. Tests: ≡ searchFlat over all slots (exact), subset-only results, mask interaction, empty slots, INVALID_SLOT.

3. **`src/index/coarse.ts`** (new) — `CoarseQuantizer`: `train(vecs, nlist, nprobe, metric, dim, seed)` (sample min(m, 64·nlist) via partial Fisher-Yates with domain-separated rng seed; spherical for cosine/dot), `fromState(centroids, listForSlot, ...)`, `assign`, `addSlot`, `swapRemove(i, last)` (two-step dance: A) swap-pop slot i from its list, B) renumber last→i reading posForSlot AFTER step A; handles i===last and same-list-tail), `clear()` (keeps centroids), `probe(query, nprobe)` → concatenated Int32Array of top-nprobe lists' slots via TopK. State: `#postings: number[][]`, `#listForSlot`, `#posForSlot`. `defaultNprobe(nlist) = max(1, ceil(nlist/8))`. Tests: invariant fuzz (postings[listForSlot[s]][posForSlot[s]] === s after every op), train determinism, probe(nlist) = all slots, fromState round-trip.

4. **`src/index/turboquant-index.ts`** — `TurboQuantIndexOptions.ivf?: IvfOptions {nlist (int, [2, 2^22]), nprobe? (int, [1,nlist])}`; `IndexSearchOptions.nprobe?`; `IndexError` codes `INVALID_NLIST|INVALID_NPROBE`; getter `ivfActive`. `@internal trainIvfFromBatch(vecs)` mirroring fitCalibrationFromBatch (train iff first batch ≥ nlist, else freeze flat forever); hooks: `add()` after calibration call, `#appendOne` → `addSlot(slot, rawVec)`, `swapRemove` → coarse.swapRemove(i, last), `clear()` → coarse.clear(). Search: IVF branch before WASM block (`probe` + `searchSlots`); nprobe ignored when flat (documented); WASM/FastScan bypassed under IVF (documented as future wave). toPayload/fromPayload carry ivf state; fromPayload freezes. Tests: validation, train/freeze semantics, nprobe=nlist ≡ flat (exact scores), recall sanity on 64-cluster mixture, remove parity vs flat twin, round-trip, mask interaction.

5. **`src/io/serialize.ts`** — VERSION = 2 (read accepts only 2). New section after calibration, before ids: flag u8 ∈ {0,1}; if 1: nlist u32 + nprobe u32 + centroids nlist·dim f32 + listForSlot n·u32 (postings rebuilt on load). `DeserializeError` code `BAD_IVF`; bounds-check before allocation; validate nlist ∈ [2, 2^22], nprobe ∈ [1, nlist], centroids finite, listForSlot entries < nlist. `IndexPayload.ivf?`. Tests: round-trips (±ivf, ±calibration, both kinds), crafted v1 → BAD_VERSION, all corrupt-IVF branches, n=0 with ivf.

6. **Plumbing** — IdMapIndex: call `trainIvfFromBatch` in addWithIds beside calibration; `IdMapSearchOptions.nprobe?` forwarded; `ivfActive` getter. Collection: `CollectionConfig.ivf?`, constructor passes through; `SearchParams.nprobe?` forwarded. `src/index.ts`: export `IvfOptions` type. Tests in id-map-index.test.ts + collection.test.ts (passthrough, remove/delete under ivf, filter+ivf, round-trip).

7. **Bench + docs + release** — `benchmarks/ivf.ts` (synthetic 64-cluster mixture, dim 768, n 20k, nlist 128; flat baseline + nprobe ∈ {1,2,4,8,16,128}; recall@1/10/100, QPS, METRIC lines, results JSON) + `bench:ivf` script. Docs: README (scope note, quickstart, roadmap table ✅), docs/roadmap.md (Planned→Shipped), serialization.md (v2 layout), api-reference.md, guide.md, architecture.md (module map), benchmarks.md. Version bump 0.0.3 (publish trigger on main merge). Branch `feat/ivf`, PR to main; do NOT merge without user confirmation (merge auto-publishes to npm).

## Verification
- nprobe=nlist ≡ searchFlat exactly (IVF analog of the WASM≡scalar oracle).
- Invariant fuzz over add/remove/clear; determinism (same seed+data ⇒ byte-identical toBytes()).
- Every BAD_IVF branch hit by a crafted-buffer test; no allocation before bounds check.
- `bun run typecheck && lint && format:check && test:coverage` (90% global) && build; run bench:ivf, commit results JSON.
39 changes: 25 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ scalar codebook is fully determined by `(dim, bits)` with **no data and ~zero in
| Dependencies | **Zero** runtime dependencies |

> **Scope:** quantvec is a _flat quantized index_ — O(n) scan over compact codes (à la FAISS
> `IndexPQFastScan`). Great recall and throughput up to ~1–10M vectors. An IVF coarse-quantizer
> for larger corpora is on the roadmap.
> `IndexPQFastScan`) — with an opt-in **IVF coarse quantizer** (`ivf: { nlist }`) that probes
> only the nearest cells for sublinear search on large corpora (**11× QPS at equal recall**
> measured at 20k vectors; the gain grows with n).

A 1M × 1536-d corpus (e.g. OpenAI `text-embedding-ada-002`) is **6.1 GB as float32**. At 4 bits
quantvec packs it into **~780 MB** (7.92×); at 2 bits, **~390 MB** (15.67×) — with **94%+
Expand Down Expand Up @@ -78,6 +79,16 @@ exact rescore of the candidate pool):
const index = new TurboQuantIndex({ dim: 1536, bits: 4, fastscan: true });
```

For large corpora, enable the **IVF coarse quantizer** — k-means cells are trained from the
first add (needs ≥ `nlist` vectors; ~32·nlist recommended) and queries probe only the nearest
`nprobe` cells (sublinear scan; ~11× QPS at equal recall on clustered data):

```ts
const index = new TurboQuantIndex({ dim: 1536, ivf: { nlist: 1024 } });
index.add(corpus); // first batch trains + freezes the cells
index.search(query, 10, { nprobe: 32 }); // per-query recall/speed knob
```

### Stable ids: `IdMapIndex`

```ts
Expand Down Expand Up @@ -275,18 +286,18 @@ Full results and JSON in [`benchmarks/`](./benchmarks/).

## Roadmap

| Status | Item |
| ------ | -------------------------------------------------------------------------------------------------------- |
| ✅ | Core math: rotation, Beta/Lloyd-Max codebooks, encode pipeline, flat nibble-LUT search |
| ✅ | `TurboQuantIndex`, `IdMapIndex`, versioned serialization, Node fs helpers |
| ✅ | True 2/3/4-bit **bit-packed serialization** (7.9–15.7× compression) |
| ✅ | **FWHT rotation** for power-of-two dims (O(d·log d), ~25× faster encode) |
| ✅ | **TQ+ per-coordinate calibration** (opt-in; data-dependent) |
| ✅ | **Exact WASM scoring kernel** (AssemblyScript, bit-identical to scalar, ~1.3× query) |
| ✅ | **v128 FastScan kernel** (blocked-nibble swizzle + exact rescore, **~5.7× query**) |
| ✅ | **Ergonomic `createCollection`** with typed payloads and filter DSL |
| ✅ | Real-dataset benchmarks: SIFT-small + GloVe-200 + dbpedia-OpenAI-100k (results in `benchmarks/results/`) |
| 📋 | IVF / coarse-quantizer for 10M+ corpora |
| Status | Item |
| ------ | --------------------------------------------------------------------------------------------------------- |
| ✅ | Core math: rotation, Beta/Lloyd-Max codebooks, encode pipeline, flat nibble-LUT search |
| ✅ | `TurboQuantIndex`, `IdMapIndex`, versioned serialization, Node fs helpers |
| ✅ | True 2/3/4-bit **bit-packed serialization** (7.9–15.7× compression) |
| ✅ | **FWHT rotation** for power-of-two dims (O(d·log d), ~25× faster encode) |
| ✅ | **TQ+ per-coordinate calibration** (opt-in; data-dependent) |
| ✅ | **Exact WASM scoring kernel** (AssemblyScript, bit-identical to scalar, ~1.3× query) |
| ✅ | **v128 FastScan kernel** (blocked-nibble swizzle + exact rescore, **~5.7× query**) |
| ✅ | **Ergonomic `createCollection`** with typed payloads and filter DSL |
| ✅ | Real-dataset benchmarks: SIFT-small + GloVe-200 + dbpedia-OpenAI-100k (results in `benchmarks/results/`) |
| | **IVF / coarse-quantizer** for 10M+ corpora (k-means cells, full remove parity, ~11× QPS at equal recall) |

---

Expand Down
216 changes: 216 additions & 0 deletions benchmarks/ivf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
// quantvec — IVF coarse-quantizer benchmark: speedup vs recall against the flat scan.
//
// Measures the IVF value proposition on clustered data (the regime IVF exists for):
// a seeded Gaussian-mixture corpus, an exact float32 cosine ground truth, a flat
// TurboQuantIndex baseline, and the same index with `ivf` enabled swept across
// nprobe values. At nprobe = nlist the IVF results are exactly the flat scan's
// (the searchSlots oracle), so the sweep shows the recall/QPS trade-off cleanly.
//
// Self-contained and deterministic. Emits a human table, `METRIC key=value` lines,
// and a JSON results file. Run: `npm run bench:ivf` (env: DIM, N, NQ, CLUSTERS, NLIST).

import { mkdirSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
import { TurboQuantIndex } from '../src/index/turboquant-index';

// ── Deterministic PRNG + Gaussian (mulberry32 + Box–Muller) ────────────────────
function mulberry32(seed: number): () => number {
let a = seed >>> 0;
return () => {
a |= 0;
a = (a + 0x6d2b79f5) | 0;
let t = Math.imul(a ^ (a >>> 15), 1 | a);
t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
};
}

function gaussian(rng: () => number): number {
let u = 0;
let v = 0;
while (u === 0) u = rng();
while (v === 0) v = rng();
return Math.sqrt(-2 * Math.log(u)) * Math.cos(2 * Math.PI * v);
}

/** Gaussian mixture: `clusters` centers (sigma 5), unit-sigma points around them. */
function makeClustered(
count: number,
dim: number,
clusters: number,
rng: () => number,
): { vectors: Float32Array[]; centers: Float32Array[] } {
const centers = Array.from({ length: clusters }, () => {
const c = new Float32Array(dim);
for (let i = 0; i < dim; i++) c[i] = gaussian(rng) * 5;
return c;
});
const vectors: Float32Array[] = [];
for (let j = 0; j < count; j++) {
const c = centers[j % clusters]!;
const v = new Float32Array(dim);
for (let i = 0; i < dim; i++) v[i] = c[i]! + gaussian(rng);
vectors.push(v);
}
return { vectors, centers };
}

// ── Exact cosine top-k ground truth ────────────────────────────────────────────
function normalized(v: Float32Array): Float32Array {
let s = 0;
for (let i = 0; i < v.length; i++) s += v[i]! * v[i]!;
const inv = 1 / Math.sqrt(s);
const out = new Float32Array(v.length);
for (let i = 0; i < v.length; i++) out[i] = v[i]! * inv;
return out;
}

function exactTopK(db: Float32Array[], query: Float32Array, k: number): number[] {
const q = normalized(query);
const scored = db.map((v, idx) => {
const u = normalized(v);
let dot = 0;
for (let i = 0; i < u.length; i++) dot += u[i]! * q[i]!;
return { idx, dot };
});
scored.sort((a, b) => b.dot - a.dot);
return scored.slice(0, k).map((s) => s.idx);
}

function recallAt(approx: Int32Array, exact: number[], k: number): number {
const truth = new Set(exact.slice(0, k));
let hit = 0;
for (let i = 0; i < Math.min(k, approx.length); i++) if (truth.has(approx[i]!)) hit++;
return hit / k;
}

// ── Benchmark one configuration ────────────────────────────────────────────────
interface Row {
label: string;
nprobe?: number;
recall1: number;
recall10: number;
recall100: number;
qps: number;
speedupVsFlat: number;
}

function benchIndex(
label: string,
index: TurboQuantIndex,
queries: Float32Array[],
exact: number[][],
nprobe?: number,
): Omit<Row, 'speedupVsFlat'> {
const K = 100;
const opts = nprobe === undefined ? {} : { nprobe };
const approxAll: Int32Array[] = [];
const tSearch = performance.now();
for (const q of queries) approxAll.push(index.search(q, K, opts).indices);
const searchSecs = (performance.now() - tSearch) / 1000;

let r1 = 0;
let r10 = 0;
let r100 = 0;
for (let i = 0; i < queries.length; i++) {
r1 += recallAt(approxAll[i]!, exact[i]!, 1);
r10 += recallAt(approxAll[i]!, exact[i]!, 10);
r100 += recallAt(approxAll[i]!, exact[i]!, 100);
}
const nq = queries.length;
const row: Omit<Row, 'speedupVsFlat'> = {
label,
recall1: r1 / nq,
recall10: r10 / nq,
recall100: r100 / nq,
qps: nq / searchSecs,
};
if (nprobe !== undefined) row.nprobe = nprobe;
return row;
}

// ── Run ────────────────────────────────────────────────────────────────────────
function main(): void {
const dim = Number(process.env.DIM ?? 768);
const n = Number(process.env.N ?? 20000);
const nq = Number(process.env.NQ ?? 200);
const clusters = Number(process.env.CLUSTERS ?? 64);
const nlist = Number(process.env.NLIST ?? 128);
const rng = mulberry32(42);

process.stdout.write(
`quantvec IVF benchmark — dim=${dim} n=${n} queries=${nq} clusters=${clusters} nlist=${nlist} (cosine, 4-bit)\n`,
);

const { vectors: db } = makeClustered(n, dim, clusters, rng);
const { vectors: queries } = makeClustered(nq, dim, clusters, mulberry32(7));
process.stdout.write('computing exact float32 ground truth…\n');
const exact = queries.map((q) => exactTopK(db, q, 100));

// Flat baseline (scalar path — the IVF scan is scalar too, so the comparison is fair).
const flat = new TurboQuantIndex({ dim, bits: 4, metric: 'cosine', seed: 1, wasm: false });
flat.add(db);
const flatRow: Row = { ...benchIndex('flat', flat, queries, exact), speedupVsFlat: 1 };

// IVF index: trained from the same single batch.
const tTrain = performance.now();
const ivf = new TurboQuantIndex({ dim, bits: 4, metric: 'cosine', seed: 1, ivf: { nlist } });
ivf.add(db);
const trainSecs = (performance.now() - tTrain) / 1000;
if (!ivf.ivfActive) throw new Error('IVF did not train — first batch smaller than nlist?');

const sweep = [1, 2, 4, 8, 16, nlist].filter((p, i, arr) => arr.indexOf(p) === i && p <= nlist);
const rows: Row[] = [flatRow];
for (const nprobe of sweep) {
const r = benchIndex(`ivf@${nprobe}`, ivf, queries, exact, nprobe);
rows.push({ ...r, speedupVsFlat: r.qps / flatRow.qps });
}

// Human table.
process.stdout.write(`\nbuild+train: ${trainSecs.toFixed(2)}s for ${n} vectors\n`);
process.stdout.write('\nconfig | recall@1 | recall@10 | recall@100 | QPS | speedup\n');
process.stdout.write('----------|----------|-----------|------------|--------|--------\n');
for (const r of rows) {
process.stdout.write(
`${r.label.padEnd(9)} | ${r.recall1.toFixed(3)} | ${r.recall10.toFixed(3)} | ${r.recall100.toFixed(
3,
)} | ${Math.round(r.qps).toString().padStart(6)} | ${r.speedupVsFlat.toFixed(2)}x\n`,
);
}

// METRIC lines (autoresearch protocol).
process.stdout.write('\n');
process.stdout.write(`METRIC flat_qps=${Math.round(flatRow.qps)}\n`);
for (const r of rows) {
if (r.nprobe === undefined) continue;
process.stdout.write(`METRIC ivf_recall_at10_nprobe${r.nprobe}=${r.recall10.toFixed(4)}\n`);
process.stdout.write(`METRIC ivf_qps_nprobe${r.nprobe}=${Math.round(r.qps)}\n`);
}

// JSON results.
const outDir = join(process.cwd(), 'benchmarks', 'results');
mkdirSync(outDir, { recursive: true });
const outPath = join(outDir, `ivf-d${dim}.json`);
writeFileSync(
outPath,
JSON.stringify(
{
dim,
n,
nq,
clusters,
nlist,
metric: 'cosine',
bits: 4,
trainSecs,
generatedAt: new Date().toISOString(),
rows,
},
null,
2,
),
);
process.stdout.write(`\nwrote ${outPath}\n`);
}

main();
75 changes: 75 additions & 0 deletions benchmarks/results/ivf-d768.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{
"dim": 768,
"n": 20000,
"nq": 200,
"clusters": 64,
"nlist": 128,
"metric": "cosine",
"bits": 4,
"trainSecs": 17.951041874999994,
"generatedAt": "2026-06-10T12:35:16.896Z",
"rows": [
{
"label": "flat",
"recall1": 0.355,
"recall10": 0.6029999999999996,
"recall100": 0.7497500000000001,
"qps": 52.78636802603001,
"speedupVsFlat": 1
},
{
"label": "ivf@1",
"recall1": 0.24,
"recall10": 0.3864999999999999,
"recall100": 0.4408500000000001,
"qps": 1205.0945395762583,
"nprobe": 1,
"speedupVsFlat": 22.82965440967642
},
{
"label": "ivf@2",
"recall1": 0.305,
"recall10": 0.5320000000000001,
"recall100": 0.6422999999999999,
"qps": 1090.9819903257126,
"nprobe": 2,
"speedupVsFlat": 20.667873754597544
},
{
"label": "ivf@4",
"recall1": 0.355,
"recall10": 0.6019999999999996,
"recall100": 0.7372500000000002,
"qps": 851.6927813487874,
"nprobe": 4,
"speedupVsFlat": 16.134710782314116
},
{
"label": "ivf@8",
"recall1": 0.36,
"recall10": 0.6029999999999996,
"recall100": 0.7495499999999999,
"qps": 600.3408441145751,
"nprobe": 8,
"speedupVsFlat": 11.373028048047084
},
{
"label": "ivf@16",
"recall1": 0.355,
"recall10": 0.6029999999999996,
"recall100": 0.7498500000000001,
"qps": 377.1228478449929,
"nprobe": 16,
"speedupVsFlat": 7.144322709587182
},
{
"label": "ivf@128",
"recall1": 0.355,
"recall10": 0.6029999999999996,
"recall100": 0.7497500000000001,
"qps": 59.81218376773218,
"nprobe": 128,
"speedupVsFlat": 1.1330990557682923
}
]
}
Loading
Loading