Add a HIP/ROCm build path for AMD GPUs by jeffdaily · Pull Request #35 · NVIDIA/cuBQL

jeffdaily · 2026-06-17T00:33:21Z

cuBQL already carries first-party HIP scaffolding (the __HIPCC__ includes of <hip/hip_runtime.h> and the cub->hipcub alias from #34), but nothing invoked hipcc and a few HIP gaps left that path uncompilable. This finishes and wires up the HIP path so the library, the instantiate_builders translation unit, and all GPU samples build, link, and run on AMD GPUs -- while keeping the CUDA build byte-identical.

Suggested review order:

CMake -- the root CMakeLists.txt gains a CUBQL_USE_HIP option that enables the HIP language, defaults CMAKE_HIP_ARCHITECTURES without hardcoding an arch, and sets a distinct CUBQL_HAVE_HIP switch; the per-type instantiation targets and GPU samples build when either CUBQL_HAVE_CUDA or CUBQL_HAVE_HIP is set, with the .cu sources marked LANGUAGE HIP.
Runtime shims -- math/common.h maps the cudaXxx runtime symbols cuBQL uses onto their hipXxx equivalents inside the existing __HIPCC__ block (every call funnels through the CUBQL_CUDA_CALL macro). math/constants.h provides the CUDART_INF_F/CUDART_NAN device float constants (no HIP analogue). builder/cuda.h widens the cudaMallocAsync version guard to fire on HIP. bvh.h includes the GPU builder and declares the bvh_floatN typedefs under HIP. shrinkingRadiusQuery.h keys a host/device fallback off __HIP_DEVICE_COMPILE__ (HIP's per-pass macro) as well as __CUDA_ARCH__. findClosest.h aligns three forward declarations to their device-only definitions (clang rejects the decl/def attribute mismatch nvcc tolerated). vec.h widens the dim3 conversion operators.
README -- documents the CUBQL_USE_HIP option in the Building section.

Every source change is guarded by __HIPCC__ / __HIP_DEVICE_COMPILE__, and every CMake change by CUBQL_USE_HIP (default off), so a build without CUBQL_USE_HIP is an unchanged CUDA configuration.

Validation

Built and linked the library, the instantiate_builders TU, and all eight GPU samples with hipcc (ROCm 7.2.1) on AMD GPUs across Linux and Windows -- CDNA2 (gfx90a), RDNA3 (gfx1100), and RDNA4 (gfx1201). Samples 01-07 run on GPU and return correct results (the closest-point output is bit-identical to the CPU reference). Building without CUBQL_USE_HIP keeps CMAKE_CUDA_ARCHITECTURES at native and never enables HIP.

cuBQL already carried first-party HIP source scaffolding (the __HIPCC__ includes of <hip/hip_runtime.h> and the cub->hipcub alias), but nothing ever invoked hipcc and several HIP gaps left that path uncompilable. This finishes and wires up the HIP path so the library, the instantiate_builders translation unit, and all GPU samples build and link for AMD GPUs, while keeping the CUDA build byte-identical. Review order: the CMake change first (root CMakeLists.txt adds a CUBQL_USE_HIP option that enables the HIP language, defaults CMAKE_HIP_ARCHITECTURES without hardcoding an arch, and sets a distinct CUBQL_HAVE_HIP switch alongside CUBQL_HAVE_CUDA; the GPU instantiation targets and samples build when either is set, and cuBQL/CMakeLists.txt and the sample CMake mark the .cu sources LANGUAGE HIP), then the source shims. math/common.h maps the cudaXxx runtime symbols cuBQL uses onto their hipXxx equivalents inside the existing __HIPCC__ block (every call funnels through the CUBQL_CUDA_CALL(call) -> cuda##call macro). math/constants.h provides the CUDART_INF_F/CUDART_NAN device float constants, which have no HIP analogue. builder/cuda.h widens the cudaMallocAsync version guard to fire on HIP (CUDART_VERSION is 0 there). bvh.h includes the GPU builder and declares the bvh_floatN typedefs under HIP too. shrinkingRadiusQuery.h fixes a host/device fallback that was gated on __CUDA_ARCH__ to also key off __HIP_DEVICE_COMPILE__ (HIP's per-pass macro), and findClosest.h aligns three forward declarations to their device-only definitions on the HIP path (clang rejects the decl/def host/device-attribute mismatch that nvcc tolerated). vec.h widens the dim3 conversion operators to HIP. The README's Building section documents the new CUBQL_USE_HIP option. Every source change is guarded by __HIPCC__ / __HIP_DEVICE_COMPILE__, and every CMake change by CUBQL_USE_HIP (default OFF) / LANGUAGE HIP, so a build without CUBQL_USE_HIP produces an unchanged CUDA configuration. This work was authored with the assistance of Claude, an AI assistant. Test Plan: cmake -S . -B build-hip -DCUBQL_USE_HIP=ON \ -DCMAKE_HIP_ARCHITECTURES=gfx90a -DCMAKE_BUILD_TYPE=Release cmake --build build-hip -j --target \ cuBQL_cuda_float3 \ cuBQL_sample01_points_closestPoint_cuda \ cuBQL_sample01_points_closestPoint_wideBVH_cuda \ sample02_distanceToTriangleMesh sample03_insideOutside \ sample04_boxOverlapsOrInsideSurfaceMesh sample05_lineOfSight \ sample06_anyTriangleWithinRadius sample07_aggregateNBody HIP_VISIBLE_DEVICES=0 ./build-hip/cuBQL_sample01_points_closestPoint_cuda Built and linked the library, the instantiate_builders TU, and all eight GPU samples with hipcc (ROCm 7.2.1) for gfx90a, gfx1100, and gfx1201; the device code objects are amdgcn. sample01 closest-point and samples 02-07 run on GPU and return correct results. Configuring without CUBQL_USE_HIP keeps CMAKE_CUDA_ARCHITECTURES at native and never enables HIP; the CPU host targets build clean in both modes.

…oat-port) The submodules/cuBQL gitlink is moved from NVIDIA/cuBQL to the HIP-enabled fork jeffdaily/cuBQL, branch moat-port, pinned at commit b0ea6a1 (the squashed cuBQL HIP port -- adds CUBQL_USE_HIP build path for AMD GPUs, identical to what was exercised during the barney gfx90a and gfx1201 validations via BARNEY_USE_EXTERNAL_CUBQL). This makes the barney fork self-contained: `git clone --recursive --branch moat-port https://github.com/jeffdaily/barney` resolves the cuBQL submodule to the HIP build without any extra flags, enabling AMD builds from a clean checkout before the upstream cuBQL PR (NVIDIA/cuBQL#35) lands. TEMPORARY PIN: jeffdaily/cuBQL@moat-port will be re-pinned to the upstream NVIDIA/cuBQL once cuBQL PR NVIDIA#35 merges (see data/deferred.json for tracking). Test Plan: Configure barney with BARNEY_USE_EXTERNAL_CUBQL=OFF (uses submodule): cmake -S . -B build-sub -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a Confirm cuBQL at b0ea6a1 satisfies CUBQL_USE_HIP=ON. This work was authored with an AI assistant (Claude by Anthropic).

iwald-nvidia · 2026-06-17T18:38:24Z

looks good on first glance; i'll need to run CI first, and will also run a few manual tests just in case, but looks pretty good so far.

iwald-nvidia merged commit d1bfc3c into NVIDIA:main Jun 17, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a HIP/ROCm build path for AMD GPUs#35

Add a HIP/ROCm build path for AMD GPUs#35
iwald-nvidia merged 1 commit into
NVIDIA:mainfrom
jeffdaily:moat-port

jeffdaily commented Jun 17, 2026

Uh oh!

iwald-nvidia commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffdaily commented Jun 17, 2026

Validation

Uh oh!

iwald-nvidia commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants