Skip to content

[codex] Persist GraphImpl4 reload metadata#88

Draft
tpn wants to merge 9 commits into
mainfrom
codex/graphimpl4-cpu-backend
Draft

[codex] Persist GraphImpl4 reload metadata#88
tpn wants to merge 9 commits into
mainfrom
codex/graphimpl4-cpu-backend

Conversation

@tpn
Copy link
Copy Markdown
Owner

@tpn tpn commented Apr 25, 2026

Motivation

GraphImpl4 was already create-capable and had JIT/codegen smoke coverage, but file-backed reload was still incomplete for tables whose runtime lookup behavior depends on compacted key metadata. The risky gap was downsized 64-bit GraphImpl4 tables: without persisting and reconstructing the composed outer-downsize plus inner GraphImpl4 compact-key state, a reloaded table could not reliably reproduce the create-time lookup domain, especially for loaded-table JIT/codegen scenarios.

This PR makes GraphImpl4 reload semantics real for the covered CPU/LLVM paths while keeping unsupported paths explicit. Assigned8 JIT remains intentionally rejected, and loaded RawDog JIT remains not implemented.

What Changed

  • Extended the CHM01 GRAPH_INFO_ON_DISK runtime metadata with persisted graph implementation, downsize metadata, and GraphImpl4 compact-key fields.
  • Split the persisted runtime flags so graph implementation version presence is distinct from non-trivial GraphImpl4 compact-key metadata.
  • Added create-time capture and load-time validation/restoration for GraphImpl4 reload metadata, including fail-closed behavior for missing or invalid GraphImpl4 metadata.
  • Reconstructed composed downsized-64 metadata from the exact persisted bitmap instead of relying on transient create-time fields.
  • Added shared downsize bitmap metadata helpers and bit-utility coverage for contiguous, non-contiguous, and composed bitmap cases.
  • Routed loaded GraphImpl4 LLVM JIT through the restored metadata for sparse32 and downsized64 cases.
  • Kept GraphImpl4 assigned8 JIT rejection intentional and explicit.
  • Added file-backed reload tests for sparse32 and downsized64 GraphImpl4 tables, including independent in-memory create baselines so reload-vs-reload agreement cannot hide a systematic reload bug.
  • Added non-GraphImpl downsized64 reload coverage to preserve existing behavior.
  • Added a downsized64 GraphImpl4 CLI/codegen smoke and kept nested CMake argument forwarding working for generated project checks.
  • Updated the GraphImpl4 handoff notes with the new state and validation command.

Developer Impact

After this change, GraphImpl4 is no longer create-only for the covered persisted-table scenarios. File-backed GraphImpl4 sparse32 and downsized64 tables can be reloaded and exercised through the existing table interface, and loaded GraphImpl4 LLVM JIT/codegen smoke coverage now protects the compact-key metadata path.

The reload path still deliberately fails closed for unsupported or ambiguous metadata rather than guessing. RawDog loaded JIT is still out of scope for this PR.

Validation

Ran focused local validation:

  • git diff --check
  • cmake --build build-graphimpl4 --target perfecthash_unit_tests PerfectHashCreateExe
  • ./build-graphimpl4/bin/perfecthash_unit_tests --gtest_filter='PerfectHashOnlineJitTests.CreateTable64AndIndex64:PerfectHashOnlineJitTests.Index64OnNonGraphImpl64BitTableUsesMetadata:GraphImpl4BitUtils.ContiguousBitmapDetection:GraphImpl4BitUtils.ComposedDownsizeMetadataUsesComposedBitmap:GraphImpl4BitUtils.ComposedDownsizeMetadataRejectsInnerContiguityMismatch:GraphImpl4BitUtils.ComposedExtractionMatchesTwoStepExtraction:GraphImpl4BitUtils.Downsized64OuterBitmapProducesIdentityInnerBitmap:PerfectHashOnlineTests.GraphImpl4Assigned8RequiresOptIn:PerfectHashOnlineTests.GraphImpl4SupportsDownsized64BitInputs:PerfectHashOnlineTests.GraphImpl4RejectsNonGoodHashes:PerfectHashOnlineTests.GraphImpl4Assigned8JitRejected:PerfectHashOnlineTests.GraphImpl4RawDogJitMatchesIndexAssigned32:PerfectHashOnlineTests.GraphImpl4RawDogJitMatchesIndexSparse32:PerfectHashOnlineTests.GraphImpl4RawDogIndex32x4MatchesIndexAssigned16:PerfectHashOnlineTests.GraphImpl4LlvmJitMatchesIndexAssigned32:PerfectHashOnlineTests.GraphImpl4LlvmJitMatchesIndexSparse32:PerfectHashOnlineTests.GraphImpl4LlvmJitMulshrolate3RXMatchesIndexAssigned32:PerfectHashOnlineTests.GraphImpl4LlvmIndex32x4MatchesIndexAssigned16:PerfectHashOnlineTests.GraphImpl4LlvmIndex64x4MatchesDownsizedIndex:PerfectHashOnlineTests.GraphImpl4FileBackedReloadPreservesSparse32Compaction:PerfectHashOnlineTests.GraphImpl4FileBackedReloadPreservesDownsized64Compaction:PerfectHashOnlineTests.NonGraphImplFileBackedReloadPreservesDownsized64Metadata:PerfectHashOnlineTests.GraphImpl4FileBackedReloadSparse32JitIndex32:PerfectHashOnlineTests.GraphImpl4FileBackedReloadDownsized64JitIndex64'
  • ctest --test-dir build-graphimpl4 --output-on-failure -R 'graphimpl4'
  • roborev wait on the final commit: no issues found
  • scripts/roborev-matrix-review.sh HEAD^..HEAD: all agents agreed no Medium/High/Critical issues; PR is clean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant