Add a HIP/ROCm build path for AMD GPUs#35
Merged
Conversation
cuBQL already carried first-party HIP source scaffolding (the __HIPCC__
includes of <hip/hip_runtime.h> and the cub->hipcub alias), but nothing
ever invoked hipcc and several HIP gaps left that path uncompilable. This
finishes and wires up the HIP path so the library, the
instantiate_builders translation unit, and all GPU samples build and link
for AMD GPUs, while keeping the CUDA build byte-identical.
Review order: the CMake change first (root CMakeLists.txt adds a
CUBQL_USE_HIP option that enables the HIP language, defaults
CMAKE_HIP_ARCHITECTURES without hardcoding an arch, and sets a distinct
CUBQL_HAVE_HIP switch alongside CUBQL_HAVE_CUDA; the GPU instantiation
targets and samples build when either is set, and cuBQL/CMakeLists.txt and
the sample CMake mark the .cu sources LANGUAGE HIP), then the source shims. math/common.h maps the
cudaXxx runtime symbols cuBQL uses onto their hipXxx equivalents inside the
existing __HIPCC__ block (every call funnels through the
CUBQL_CUDA_CALL(call) -> cuda##call macro). math/constants.h provides the
CUDART_INF_F/CUDART_NAN device float constants, which have no HIP analogue.
builder/cuda.h widens the cudaMallocAsync version guard to fire on HIP
(CUDART_VERSION is 0 there). bvh.h includes the GPU builder and declares the
bvh_floatN typedefs under HIP too. shrinkingRadiusQuery.h fixes a host/device
fallback that was gated on __CUDA_ARCH__ to also key off __HIP_DEVICE_COMPILE__
(HIP's per-pass macro), and findClosest.h aligns three forward declarations
to their device-only definitions on the HIP path (clang rejects the
decl/def host/device-attribute mismatch that nvcc tolerated). vec.h widens
the dim3 conversion operators to HIP. The README's Building section
documents the new CUBQL_USE_HIP option.
Every source change is guarded by __HIPCC__ / __HIP_DEVICE_COMPILE__, and
every CMake change by CUBQL_USE_HIP (default OFF) / LANGUAGE HIP, so a build
without CUBQL_USE_HIP produces an unchanged CUDA configuration.
This work was authored with the assistance of Claude, an AI assistant.
Test Plan:
cmake -S . -B build-hip -DCUBQL_USE_HIP=ON \
-DCMAKE_HIP_ARCHITECTURES=gfx90a -DCMAKE_BUILD_TYPE=Release
cmake --build build-hip -j --target \
cuBQL_cuda_float3 \
cuBQL_sample01_points_closestPoint_cuda \
cuBQL_sample01_points_closestPoint_wideBVH_cuda \
sample02_distanceToTriangleMesh sample03_insideOutside \
sample04_boxOverlapsOrInsideSurfaceMesh sample05_lineOfSight \
sample06_anyTriangleWithinRadius sample07_aggregateNBody
HIP_VISIBLE_DEVICES=0 ./build-hip/cuBQL_sample01_points_closestPoint_cuda
Built and linked the library, the instantiate_builders TU, and all eight GPU
samples with hipcc (ROCm 7.2.1) for gfx90a, gfx1100, and gfx1201; the device
code objects are amdgcn. sample01 closest-point and samples 02-07 run on GPU
and return correct results. Configuring without CUBQL_USE_HIP keeps
CMAKE_CUDA_ARCHITECTURES at native and never enables HIP; the CPU host
targets build clean in both modes.
jeffdaily
added a commit
to jeffdaily/barney
that referenced
this pull request
Jun 17, 2026
…oat-port) The submodules/cuBQL gitlink is moved from NVIDIA/cuBQL to the HIP-enabled fork jeffdaily/cuBQL, branch moat-port, pinned at commit b0ea6a1 (the squashed cuBQL HIP port -- adds CUBQL_USE_HIP build path for AMD GPUs, identical to what was exercised during the barney gfx90a and gfx1201 validations via BARNEY_USE_EXTERNAL_CUBQL). This makes the barney fork self-contained: `git clone --recursive --branch moat-port https://github.com/jeffdaily/barney` resolves the cuBQL submodule to the HIP build without any extra flags, enabling AMD builds from a clean checkout before the upstream cuBQL PR (NVIDIA/cuBQL#35) lands. TEMPORARY PIN: jeffdaily/cuBQL@moat-port will be re-pinned to the upstream NVIDIA/cuBQL once cuBQL PR NVIDIA#35 merges (see data/deferred.json for tracking). Test Plan: Configure barney with BARNEY_USE_EXTERNAL_CUBQL=OFF (uses submodule): cmake -S . -B build-sub -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a Confirm cuBQL at b0ea6a1 satisfies CUBQL_USE_HIP=ON. This work was authored with an AI assistant (Claude by Anthropic).
Collaborator
|
looks good on first glance; i'll need to run CI first, and will also run a few manual tests just in case, but looks pretty good so far. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cuBQL already carries first-party HIP scaffolding (the
__HIPCC__includes of<hip/hip_runtime.h>and thecub->hipcubalias from #34), but nothing invokedhipccand a few HIP gaps left that path uncompilable. This finishes and wires up the HIP path so the library, theinstantiate_builderstranslation unit, and all GPU samples build, link, and run on AMD GPUs -- while keeping the CUDA build byte-identical.Suggested review order:
CMakeLists.txtgains aCUBQL_USE_HIPoption that enables the HIP language, defaultsCMAKE_HIP_ARCHITECTURESwithout hardcoding an arch, and sets a distinctCUBQL_HAVE_HIPswitch; the per-type instantiation targets and GPU samples build when eitherCUBQL_HAVE_CUDAorCUBQL_HAVE_HIPis set, with the.cusources markedLANGUAGE HIP.math/common.hmaps thecudaXxxruntime symbols cuBQL uses onto theirhipXxxequivalents inside the existing__HIPCC__block (every call funnels through theCUBQL_CUDA_CALLmacro).math/constants.hprovides theCUDART_INF_F/CUDART_NANdevice float constants (no HIP analogue).builder/cuda.hwidens thecudaMallocAsyncversion guard to fire on HIP.bvh.hincludes the GPU builder and declares thebvh_floatNtypedefs under HIP.shrinkingRadiusQuery.hkeys a host/device fallback off__HIP_DEVICE_COMPILE__(HIP's per-pass macro) as well as__CUDA_ARCH__.findClosest.haligns three forward declarations to their device-only definitions (clang rejects the decl/def attribute mismatch nvcc tolerated).vec.hwidens thedim3conversion operators.CUBQL_USE_HIPoption in the Building section.Every source change is guarded by
__HIPCC__/__HIP_DEVICE_COMPILE__, and every CMake change byCUBQL_USE_HIP(default off), so a build withoutCUBQL_USE_HIPis an unchanged CUDA configuration.Validation
Built and linked the library, the
instantiate_buildersTU, and all eight GPU samples withhipcc(ROCm 7.2.1) on AMD GPUs across Linux and Windows -- CDNA2 (gfx90a), RDNA3 (gfx1100), and RDNA4 (gfx1201). Samples 01-07 run on GPU and return correct results (the closest-point output is bit-identical to the CPU reference). Building withoutCUBQL_USE_HIPkeepsCMAKE_CUDA_ARCHITECTURESat native and never enables HIP.