Add online CUDA lookup and GPU/cuCollections benchmark scaffolding by tpn · Pull Request #83 · tpn/perfecthash

tpn · 2026-03-30T22:33:03Z

Closes #78.

Summary

add online-JIT CUDA export APIs for emitted CUDA source, table payloads, and table metadata
add standalone GPU benchmark/examples for NVRTC-based lookup plus cuCollections static_map and static_multiset baselines
add TPC-H query-probe extraction utilities for real build/probe stream benchmarking
harden the new benchmarks and emitter paths based on repeated review feedback around verification, portability, runtime loading, and edge cases

Validation

cmake --build build-online-llvm-jit --config Release --target PerfectHashOnlineCore -j 8
python -m py_compile examples/tpch-query-probes/extract_tpch_query_probes.py examples/tpch-query-probes/partition_hot_subset.py
cmake --build /tmp/ph-cuco-map-build -j 8
cmake --build /tmp/ph-cuco-multiset-build -j 8
cmake -S examples/cpp-console-online-cuda-nvrtc -B /tmp/ph-online-cuda-build -DPERFECTHASH_GIT_REPOSITORY=file:///home/trentn/src/perfecthash -DPERFECTHASH_GIT_TAG=78-online-cuda-gpu-bench -DPERFECTHASH_BUILD_PROFILE=online-llvm-jit >/dev/null && cmake --build /tmp/ph-online-cuda-build -j 8

Review

repeated roborev branch reviews were run during the rebase/update cycle
the final daemon-backed Codex review path was blocked by provider auth issues (401 Unauthorized on gpt-5.4), so the last clean review pass I used was via claude-code, followed by the remaining low-severity cleanup nits incorporated directly

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: adc9fe4f5a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR adds an “online CUDA export” surface to PerfectHash’s OnlineJit tables (CUDA source + table payload + metadata) and introduces standalone GPU benchmark/example drivers (NVRTC-based PerfectHash lookup plus cuCollections static_map/static_multiset baselines), along with TPC-H probe-stream extraction utilities for more realistic build/probe benchmarking.

Changes:

Add OnlineJit APIs to export generated CUDA lookup source, table payload bytes, and exported table metadata.
Add standalone GPU benchmark/example projects for NVRTC compilation + cuCollections baselines.
Add Python utilities + docs to extract and post-process TPC-H query-driven build/probe streams.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/PerfectHash/PerfectHashOnlineJit.c	Adds CUDA source/table export + table info API; adjusts compile fallback logic; adds Index32x2 wrapper.
include/PerfectHash/PerfectHashOnlineJit.h	Exposes new public structs/flags and CUDA export APIs.
src/PerfectHash/PerfectHash.def	Exports new OnlineJit CUDA/table-info symbols.
src/PerfectHash/Chm01FileWorkCudaSourceFile.c	Hardens generated CUDA source output (optional “library-only” emission; seed sourcing changes).
examples/cpp-console-online-jit/src/main.cpp	Adds CLI plumbing to dump generated CUDA source to stdout/file.
examples/cpp-console-online-jit/cmake/FindPerfectHashOnlineJit.cmake	Broadens header discovery for different include layouts.
examples/tpch-query-probes/*	Adds TPC-H probe extraction + hot/cold partitioning tools and README.
examples/cpp-console-online-cuda-nvrtc/*	New NVRTC-based PerfectHash GPU lookup benchmark/example with CPU baseline + verification.
examples/cpp-console-cuco-static-map-bench/*	New cuCollections static_map baseline benchmark + build scaffolding/docs.
examples/cpp-console-cuco-static-multiset-bench/*	New cuCollections static_multiset baseline benchmark + build scaffolding/docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread examples/cpp-console-online-cuda-nvrtc/src/main.cpp

Comment thread src/PerfectHash/PerfectHashOnlineJit.c Outdated

tpn added 29 commits April 9, 2026 13:02

Add online CUDA lookup and GPU benchmark scaffolding

4feb671

Fix benchmark review findings

28d8058

Fix CUDA emitter and verifier bugs

a8c6bce

Fix portability and emitter edge cases

690f2c7

Fix emitter string and empty-build guards

48bd4a9

Tighten benchmark verification and accounting

53fa8d0

Fix exports and emitted CUDA naming

ae21cf3

Fix query extraction and probe stream limits

3e80d18

Fix runtime compatibility edge cases

51fed78

Fix host probe arithmetic parity

bb9f56d

Fix benchmark runtime edge cases

1541cb3

Make CPU verification opt-in by default

3e8983d

Fix public metadata mapping and ratio guards

9caf4d9

Fix example portability includes

dd8398b

Harden extractor inputs and emitter contract

f36e8e6

Polish review follow-ups

06ad297

Apply final review nits

a1dff39

Address PR review thread feedback

06826de

Tighten experimental lookup mode handling

acaf720

Fix CI and verifier assumptions

de24af1

Make direct probe validation bounds-safe

05a6fa4

Relax benchmark verification assumptions

2226fd5

Enforce unique build domains in extractors

c56b0e2

Handle tail items and subset metadata explicitly

ab47462

Fix experimental kernel tail handling and Q21 extraction

59e8884

Harden tail handling and split timing

5ace6ee

Restrict verification to safe modes

86b6145

Guard split gather and CPU baseline assumptions

8878af3

Fix emitted namespace and Q21 grouping

b640ba9

tpn added 4 commits April 9, 2026 13:02

Use table hash metadata for host analysis

dd78eb4

Fix blocksort shared memory accounting

9145240

Tighten CUDA source export sizing and copies

b4640a7

Fix Windows export path and raise CUDA source reserve

ea72ebd

tpn force-pushed the 78-online-cuda-gpu-bench branch from bf92a88 to ea72ebd Compare April 9, 2026 21:02

Fix Windows copy macros and CUDA file-work source access

0240f39

Copilot AI review requested due to automatic review settings April 18, 2026 01:45

Copilot started reviewing on behalf of tpn April 18, 2026 01:45 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Comment thread src/PerfectHash/PerfectHashOnlineJit.c

Comment thread examples/tpch-query-probes/partition_hot_subset.py Outdated

Comment thread src/PerfectHash/PerfectHashOnlineJit.c Outdated

Comment thread src/PerfectHash/PerfectHashOnlineJit.c

tpn added 4 commits April 17, 2026 18:53

Address Copilot review comments

94422ff

Fix current Windows warnings and assigned buffer fallback

2f19cf6

Prefer local tree for example fallback builds

dcdefe3

examples: prefer online core in jit finder

4658f37

tpn merged commit 9e713a2 into main Apr 18, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add online CUDA lookup and GPU/cuCollections benchmark scaffolding#83

Add online CUDA lookup and GPU/cuCollections benchmark scaffolding#83
tpn merged 38 commits into
mainfrom
78-online-cuda-gpu-bench

tpn commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tpn commented Mar 30, 2026

Summary

Validation

Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants