perf: eliminate ref-fetch parallel overhead + fuse variant-window fetches (#221) by d-laub · Pull Request #223 · mcvickerlab/GenVarLoader

d-laub · 2026-06-14T01:19:22Z

Summary

Eliminates two sources of overhead in the variant-windows reference read path so Reference.fetch scales with bytes copied instead of dominating the decode (issue #221):

Numba thread cap — numba.get_num_threads() reports host logical CPUs, not the cgroup allocation (e.g. 208 reported vs. 52 allocated), so parallel=True regions paid a flat ~37 ms fork-join for trivial work. New _threads.py caps the worker count to the cgroup-aware core count once at import (overridable via GVL_NUM_THREADS).
Serial/parallel kernel dispatch — the two reference-copy kernels (_fetch_impl in Reference.fetch, and get_reference) now route to a serial njit below a per-thread byte threshold and a parallel njit above it, via should_parallelize(total_bytes). Both kernels share an inline="always" row body, so serial and parallel are byte-identical by construction.
Fetch fusion (3 → 1) — the variant-windows flank builders in _flat_flanks.py now do a single [start−L, end+L) read and slice f5/f3 internally. The both-window decode is routed through the fused compute_windows (1 fetch instead of 2). Public signatures are unchanged, so the existing oracle tests act as byte-identity guards.

Test Plan

pixi run -e dev pytest tests/dataset/ tests/unit/dataset/ tests/unit/test_threads.py → 294 passed, 4 skipped, 2 xfailed
Kernel-agreement tests (_fetch_impl_ser/_par, _get_reference_ser/_par) confirm serial ≡ parallel, incl. OOB-left/right regions
Existing _oracle_* / split-equivalence tests confirm the fused fetch is byte-identical to the old separate-fetch path
New test_variant_windows_single_fetch_per_decode pins the both-window decode to exactly 1 Reference.fetch call (down from 3)
ruff check python/ clean; pyrefly 0 errors
Production confirmation against gvf-germ-som per Perf: Reference.fetch dominated by numba parallel=True fork-join overhead in variant-windows path (~37ms/call for tiny windows); also 3 redundant fetches/decode #221 acceptance criteria

Closes #221.

🤖 Generated with Claude Code

…cate (#221) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bytes (#221) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A non-integer override (e.g. "auto") previously raised ValueError during `import genvarloader`. Fall back to cgroup detection instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

for more information, see https://pre-commit.ci

d-laub and others added 7 commits June 14, 2026 01:20

perf(threads): cap numba workers to cgroup cores + add dispatch predi…

b707c0e

…cate (#221) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(reference): dispatch fetch kernel serial/parallel by per-thread …

2eee035

…bytes (#221) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(reference): dispatch get_reference kernel serial/parallel (#221)

d6c7a66

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(flanks): fuse 3 ref-window fetches into 1 via flank slicing (#221)

8acc944

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(variant-windows): single fused fetch for both-window decode (#221)

71dca9c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(threads): tolerate malformed GVL_NUM_THREADS at import (#221)

7bce236

A non-integer override (e.g. "auto") previously raised ValueError during `import genvarloader`. Fall back to cgroup detection instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(pre-commit): auto fixes from pre-commit.com hooks

1bba476

for more information, see https://pre-commit.ci

d-laub force-pushed the worktree-ref-fetch-parallel-overhead branch from dc98d4e to 1bba476 Compare June 14, 2026 01:23

d-laub merged commit 8ec0db6 into main Jun 14, 2026
7 checks passed

d-laub deleted the worktree-ref-fetch-parallel-overhead branch June 14, 2026 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: eliminate ref-fetch parallel overhead + fuse variant-window fetches (#221)#223

perf: eliminate ref-fetch parallel overhead + fuse variant-window fetches (#221)#223
d-laub merged 7 commits into
mainfrom
worktree-ref-fetch-parallel-overhead

d-laub commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

d-laub commented Jun 14, 2026

Summary

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant