Skip to content

perf: eliminate ref-fetch parallel overhead + fuse variant-window fetches (#221)#223

Merged
d-laub merged 7 commits into
mainfrom
worktree-ref-fetch-parallel-overhead
Jun 14, 2026
Merged

perf: eliminate ref-fetch parallel overhead + fuse variant-window fetches (#221)#223
d-laub merged 7 commits into
mainfrom
worktree-ref-fetch-parallel-overhead

Conversation

@d-laub

@d-laub d-laub commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

Eliminates two sources of overhead in the variant-windows reference read path so Reference.fetch scales with bytes copied instead of dominating the decode (issue #221):

  • Numba thread capnumba.get_num_threads() reports host logical CPUs, not the cgroup allocation (e.g. 208 reported vs. 52 allocated), so parallel=True regions paid a flat ~37 ms fork-join for trivial work. New _threads.py caps the worker count to the cgroup-aware core count once at import (overridable via GVL_NUM_THREADS).
  • Serial/parallel kernel dispatch — the two reference-copy kernels (_fetch_impl in Reference.fetch, and get_reference) now route to a serial njit below a per-thread byte threshold and a parallel njit above it, via should_parallelize(total_bytes). Both kernels share an inline="always" row body, so serial and parallel are byte-identical by construction.
  • Fetch fusion (3 → 1) — the variant-windows flank builders in _flat_flanks.py now do a single [start−L, end+L) read and slice f5/f3 internally. The both-window decode is routed through the fused compute_windows (1 fetch instead of 2). Public signatures are unchanged, so the existing oracle tests act as byte-identity guards.

Test Plan

Closes #221.

🤖 Generated with Claude Code

d-laub and others added 7 commits June 14, 2026 01:20
…cate (#221)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bytes (#221)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A non-integer override (e.g. "auto") previously raised ValueError during
`import genvarloader`. Fall back to cgroup detection instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-laub d-laub force-pushed the worktree-ref-fetch-parallel-overhead branch from dc98d4e to 1bba476 Compare June 14, 2026 01:23
@d-laub d-laub merged commit 8ec0db6 into main Jun 14, 2026
7 checks passed
@d-laub d-laub deleted the worktree-ref-fetch-parallel-overhead branch June 14, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant