Migrate samples from cuda-samples by danielfrg · Pull Request #9697 · NVIDIA/cccl

danielfrg · 2026-07-02T20:24:09Z

Description

closes

Migrating the existing CCCL (c++ and python) samples from cuda-samples to here.

These will be mirrored on each release to the cuda-samples repo so we need to keep them in the same format as they are in today.

Also added a CI run of these.

In general everything in /samples doesnt need to be reviewed right now. Those are simply migrated here and formatted. We can evaluate adding/removing/cleaning in the future.

Checklist

[x ] New or existing tests cover these changes.
[x ] The documentation is up to date with these changes.

copy-pr-bot · 2026-07-02T20:24:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-07-02T20:40:16Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added several new CUDA and Python samples covering device search, segmented scan, transform, tensor-core GEMM, async copy, mdspan, random distributions, line-of-sight, radix sort, segmentation trees, and barrier-based workflows.
- Added shared sample utilities plus new sample build and test infrastructure, including broader CI coverage and cross-platform build support.
Documentation
- Added README files for the new samples and Python utilities.
Bug Fixes
- Improved file ignore and pre-commit handling for generated files and image assets.

Walkthrough

Adds a new top-level "samples" directory with a standalone CMake-based CCCL samples project, shared Common helper headers, CPM/install/toolchain CMake infrastructure, multiple CUDA/C++ and Python sample programs, CI matrix/script wiring for building and testing samples, and test-runner scripts (run_tests.py, test_samples.py). Minor repo config updates included.

Changes

CI and Infrastructure

Layer / File(s)	Summary
Repo config `.codespell-ignore.txt`, `.gitignore`, `.pre-commit-config.yaml`	Adds spellcheck ignore, ignores test-results/pytest cache directories, excludes `.ppm` files from pre-commit hooks.
CI matrix/project wiring `ci/matrix.yaml`, `ci/project_files_and_dependencies.yaml`, `ci/test_cuda_cccl_examples_python.sh`	Adds `samples` project/job definitions across pull_request/lite/nightly matrices, dirty-file detection regexes, and conditional Python samples test execution.
Build/test scripts `ci/build_samples.sh`, `ci/test_samples.sh`, `ci/windows/build_samples.ps1`	New Linux/Windows scripts to configure, build, install, and run the samples CMake project in CI.
Samples build infra `samples/CMakeLists.txt`, `samples/cmake/CPM.cmake`, `samples/cmake/InstallSamples.cmake`, `samples/cmake/Modules/`, `samples/cmake/toolchains/`	Root project configuration, CPM dependency fetcher, shared install logic, find-modules (EGL/FreeImage/NVSCI), and aarch64 Linux/QNX cross-compile toolchains.

Common Helper Headers

Layer / File(s)	Summary
Common headers `samples/Common/exception.h`, `helper_cuda.h`, `helper_functions.h`, `helper_image.h`, `helper_math.h`, `helper_string.h`, `helper_timer.h`	New shared header-only utilities: exception wrapper, CUDA error checking/device selection, PPM/PGM I/O and comparisons, vector math, CLI/string parsing, and stopwatch timers used across samples.

CUDA/C++ Samples

Layer / File(s)	Summary
CUB algorithm samples `samples/cubDeviceFind/`, `samples/cubDeviceSegmentedScan/`, `samples/cubDeviceTransform/*`	New samples demonstrating `cub::DeviceFind`, `DeviceSegmentedScan`, and `DeviceTransform` with CMake build files and READMEs.
Advanced samples `samples/dmmaTensorCoreGemm/`, `samples/globalToShmemAsyncCopy/`, `samples/libcuxxMdspan/`, `samples/libcuxxRandom/`, `samples/lineOfSight/`, `samples/radixSortThrust/`, `samples/segmentationTreeThrust/`, `samples/simpleAWBarrier/`	New samples covering tensor-core GEMM, async shared-memory copy, DLPack/mdspan interop, libcu++ random distributions, Thrust line-of-sight/radix sort, segmentation trees, and arrive-wait barrier reduction, each with CMakeLists and README.

Python Samples and Test Runners

Layer / File(s)	Summary
Python Utilities `samples/python/Utilities/*`, `samples/python/requirements.txt`	Shared package for CUDA requirement checks and result verification.
cuda.compute samples `samples/python/cuda.compute/binarySearch/`, `.../cudaComputeLambdas/`, `.../parallelReduction/`, `.../prefixSum/`	New Python samples for binary search, lambda-based operators, parallel reduction, and prefix sum using `cuda.compute`/`cuda.core`.
Python test driver `samples/python/test_samples.py`	Pytest driver discovering and running Python samples as subprocesses with GPU/import guards.
C++ test runner `samples/run_tests.py`, `samples/test_args.json`	Discovers and runs built sample executables in parallel, classifies pass/waive/fail, and reports summary/exit code.

Sequence Diagram(s)

sequenceDiagram
  participant CI
  participant build_samples.sh
  participant CMake
  participant test_samples.sh
  participant run_tests.py
  CI->>build_samples.sh: invoke build
  build_samples.sh->>CMake: configure/build/install
  CI->>test_samples.sh: invoke test
  test_samples.sh->>run_tests.py: run installed executables
  run_tests.py-->>CI: pass/fail summary

Related Issues: None referenced in the provided data.

Related PRs: None referenced in the provided data.

Suggested labels: ci, samples, cmake, documentation

Suggested reviewers: None determinable from provided data.

important: samples/cmake/InstallSamples.cmake and samples/cmake/CPM.cmake are large, vendored-style CMake modules — verify they match upstream unmodified or document local changes, since divergence is hard to spot in review.

important: multiple sample CMakeLists fetch CCCL via CPM from GitHub pinned to v3.3.3 by default; confirm this tag stays in sync with the repo's actual CCCL version to avoid version drift between the in-repo library and the one built by samples.

suggestion: samples/test_args.json is currently an empty object — confirm this is intentional placeholder content and not a leftover from stubbing out test_samples.sh's config path.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 16

🧹 Nitpick comments (20)

samples/Common/helper_image.h (2)
224-224: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

suggestion: several malloc(sizeof(T) * n)-style allocations here (e.g. Line 224, Line 256, Line 279, Line 352, Line 369, Line 442) compute sizes without overflow checks, flagged by static analysis (CWE-190). Practical risk is low for typical sample image dimensions and this mirrors the original cuda-samples implementation, so treating as optional hardening rather than a blocker.

Also applies to: 256-256, 279-279, 352-352, 369-369, 442-442

Source: Linters/SAST tools

415-425: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

suggestion: sdkReadFile uses while (!feof(fh)) with an unchecked fscanf, relying on the trailing pop_back() to drop one bogus element. This is a known anti-pattern — if fscanf fails for a non-EOF reason (malformed token), a stale/garbage token value is retained and the compensating pop_back() doesn't guarantee correctness. Checking the fscanf return value directly would be more robust.
♻️ Proposed fix
-  while (!feof(fh))
-  {
-    fscanf(fh, "%f", &token);
-    data_read.push_back(token);
-  }
-
-  // the last element is read twice
-  data_read.pop_back();
-  fclose(fh);
+  while (fscanf(fh, "%f", &token) == 1)
+  {
+    data_read.push_back(token);
+  }
+
+  fclose(fh);
samples/Common/helper_string.h (1)
114-124: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

suggestion: getFileExtension can read out of bounds for an empty filename. filename[string_length--] accesses index 0 (the null terminator) then decrements to -1 before the string_length == 0 guard is checked, so the next iteration reads filename[-1]. Not currently exercised by the provided call sites, but worth a bounds check for robustness.
♻️ Proposed fix
 inline int getFileExtension(char* filename, char** extension)
 {
   int string_length = static_cast<int>(strlen(filename));
+
+  if (string_length == 0)
+  {
+    *extension = NULL;
+    return 0;
+  }
samples/cubDeviceFind/CMakeLists.txt (1)

16-27: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

suggestion: CMAKE_CUDA_ARCHITECTURES doesn't include SM 7.0 (70), but the README's "Supported SM Architectures" section lists SM 7.0 as supported. Either drop the SM 7.0 badge from the README or add 70 to this list.

samples/cubDeviceSegmentedScan/cubDeviceSegmentedScan.cu (1)

152-152: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Pass cuda::maximum<>{} directly as the binary op; the lambda wrapper adds no behavior.
samples/libcuxxRandom/README.md (1)
13-13: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: fix markdownlint findings — trailing spaces inside link text on line 13 ([SM 7.0 ] → [SM 7.0], repeated for each SM badge) and an empty link [Dependencies]() on line 40 (should anchor to the Dependencies section).
Proposed fix
-[SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus) [SM 10.0 ](https://developer.nvidia.com/cuda-gpus) [SM 11.0 ](https://developer.nvidia.com/cuda-gpus) [SM 12.0 ](https://developer.nvidia.com/cuda-gpus)
+[SM 7.0](https://developer.nvidia.com/cuda-gpus) [SM 7.5](https://developer.nvidia.com/cuda-gpus) [SM 8.0](https://developer.nvidia.com/cuda-gpus) [SM 8.6](https://developer.nvidia.com/cuda-gpus) [SM 8.9](https://developer.nvidia.com/cuda-gpus) [SM 9.0](https://developer.nvidia.com/cuda-gpus) [SM 10.0](https://developer.nvidia.com/cuda-gpus) [SM 11.0](https://developer.nvidia.com/cuda-gpus) [SM 12.0](https://developer.nvidia.com/cuda-gpus)
-Make sure the dependencies mentioned in [Dependencies]() section above are installed.
+Make sure the dependencies mentioned in the [Dependencies](`#dependencies-needed-to-buildrun`) section above are installed.
Also applies to: 40-40

Source: Linters/SAST tools
samples/libcuxxRandom/libcuxxRandom.cu (1)

123-138: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: --blocks uses a different CLI convention (--blocks <value>, space-separated) than --device=<value> used internally by findCudaDevice a few lines below. Consider using getCmdLineArgumentInt from helper_string.h for --blocks=N to stay consistent with the rest of the sample's CLI style.
samples/python/Utilities/__init__.py (1)
37-47: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: print_gpu_info is a public helper used by every sample (binarySearch, parallelReduction, prefixSum) but isn't re-exported here. Anyone importing via from Utilities import ... instead of the direct module-path pattern won't get it.
diff
 from .cuda_samples_utils import (
     check_cuda_requirements,
+    print_gpu_info,
     verify_array_result,
 )

 __version__ = "1.0.0"

 __all__ = [
     "check_cuda_requirements",
+    "print_gpu_info",
     "verify_array_result",
 ]
samples/python/cuda.compute/parallelReduction/README.md (1)

21-26: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Add language identifiers to both fenced code blocks.

Markdownlint is already flagging these as MD040, and tagging the fences will also improve syntax highlighting in rendered docs.

Also applies to: 74-76

Source: Linters/SAST tools

samples/run_tests.py (1)

331-342: 🚀 Performance & Scalability | 🔵 Trivial

suggestion: CI runs this with --parallel 4, but scheduling is GPU-agnostic. GPU-heavy samples (e.g. dmmaTensorCoreGemm, globalToShmemAsyncCopy) launched concurrently on a single device can OOM/contend and fail nondeterministically. Consider bounding parallelism by device memory, pinning runs to distinct devices, or marking heavy samples for serial execution in the config.

samples/test_args.json (1)

1-1: 📐 Maintainability & Code Quality | 🔵 Trivial

suggestion: Empty config means every discovered sample runs with no args, min_gpus=1, and a 300s timeout. Migrated cuda-samples that need specific CLI args, multiple GPUs, longer timeouts, or should be skipped/waived will silently fall back to defaults and may fail in CI. Want me to scaffold per-sample entries (args/runs/min_gpus/skip) for the samples in this PR?

ci/windows/build_samples.ps1 (1)

26-45: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: this script imports build_common.psm1 but never applies $script:GLOBAL_CMAKE_OPTIONS (which carries CCCL_ENABLE_WERROR/CCCL_ENABLE_PRAGMA_SYSTEM_HEADER in CI, per build_common.psm1) to the samples configure command — only the raw $CMAKE_OPTIONS parameter is appended. Since the comment at line 15-18 says presets are intentionally bypassed, this is likely by design, but it means Windows CI builds samples without warnings-as-errors enforcement that the rest of CCCL relies on for catching regressions.

ci/test_cuda_cccl_examples_python.sh (1)

31-37: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

suggestion: silently skipping the samples pytest run when test_samples.py is missing means a broken migration (e.g. file accidentally not shipped/renamed) fails silently instead of failing CI. Consider an explicit echo/warning when the file is absent, or making its presence mandatory if samples are always expected in-tree from this point on.

samples/globalToShmemAsyncCopy/CMakeLists.txt (1)

34-38: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

important: QNX exclusion here contradicts README.md line 17, which lists QNX as a supported OS. See the README comment for the cross-file fix.
samples/globalToShmemAsyncCopy/README.md (3)
13-13: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: strip the trailing spaces inside [SM X.X ] link text (flagged by markdownlint MD039), and reconcile the listed SM range with CMakeLists.txt, which only builds for SM 75+ (75 80 86 87 89 90 100 110 120). SM 7.0/7.2 in the README won't actually be compiled.
doc fix
-[SM 7.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.5 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.6 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.7 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.9 ](https://developer.nvidia.com/cuda-gpus)  [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
+[SM 7.5](https://developer.nvidia.com/cuda-gpus)  [SM 8.0](https://developer.nvidia.com/cuda-gpus)  [SM 8.6](https://developer.nvidia.com/cuda-gpus)  [SM 8.7](https://developer.nvidia.com/cuda-gpus)  [SM 8.9](https://developer.nvidia.com/cuda-gpus)  [SM 9.0](https://developer.nvidia.com/cuda-gpus)
Source: Linters/SAST tools

17-17: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

important: README lists QNX as a supported OS, but CMakeLists.txt (lines 35-38) explicitly refuses to build the sample on QNX. Update one side to match.

38-38: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: empty link [Dependencies]() (MD042). Point it at the "Dependencies needed to build/run" section above.
doc fix
-Make sure the dependencies mentioned in [Dependencies]() section above are installed.
+Make sure the dependencies mentioned in the [Dependencies](`#dependencies-needed-to-buildrun`) section above are installed.
Source: Linters/SAST tools
samples/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu (1)

50-52: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

suggestion: conditional #include <cuda/barrier> under __CUDA_ARCH__ >= 700 only takes effect during device-code compilation passes; since CMAKE_CUDA_ARCHITECTURES in this sample's CMakeLists.txt starts at 75, this resolves correctly today, but it's a fragile pattern if the minimum architecture is ever lowered. No action needed given current build config.
samples/radixSortThrust/radixSortThrust.cu (2)
227-233: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: This conditional is a no-op — both branches return bTestResult's value (the early return false only triggers when bTestResult is already false). Safe to simplify.
Proposed fix
-  if (!bTestResult && !quiet)
-  {
-    return false;
-  }
-
   return bTestResult;
 }
29-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: <thrust/detail/type_traits.h> is a Thrust internal implementation header (not part of the public API) and no symbols from it appear used in this file. Consider dropping it, or replacing with the specific public header if something from it is actually needed.

Source: Coding guidelines

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1ed8f7fb-6e50-4632-be65-99695c3b3df1

📥 Commits

Reviewing files that changed from the base of the PR and between a1fb0b8 and f0af087.

📒 Files selected for processing (82)

.codespell-ignore.txt
.gitignore
.pre-commit-config.yaml
ci/build_samples.sh
ci/matrix.yaml
ci/project_files_and_dependencies.yaml
ci/test_cuda_cccl_examples_python.sh
ci/test_samples.sh
ci/windows/build_samples.ps1
samples/CMakeLists.txt
samples/Common/exception.h
samples/Common/helper_cuda.h
samples/Common/helper_functions.h
samples/Common/helper_image.h
samples/Common/helper_math.h
samples/Common/helper_string.h
samples/Common/helper_timer.h
samples/cmake/CPM.cmake
samples/cmake/InstallSamples.cmake
samples/cmake/Modules/FindEGL.cmake
samples/cmake/Modules/FindFreeImage.cmake
samples/cmake/Modules/FindNVSCI.cmake
samples/cmake/toolchains/toolchain-aarch64-linux.cmake
samples/cmake/toolchains/toolchain-aarch64-qnx.cmake
samples/cubDeviceFind/CMakeLists.txt
samples/cubDeviceFind/README.md
samples/cubDeviceFind/cubDeviceFind.cu
samples/cubDeviceSegmentedScan/CMakeLists.txt
samples/cubDeviceSegmentedScan/README.md
samples/cubDeviceSegmentedScan/cubDeviceSegmentedScan.cu
samples/cubDeviceTransform/CMakeLists.txt
samples/cubDeviceTransform/README.md
samples/cubDeviceTransform/cubDeviceTransform.cu
samples/dmmaTensorCoreGemm/CMakeLists.txt
samples/dmmaTensorCoreGemm/README.md
samples/dmmaTensorCoreGemm/dmmaTensorCoreGemm.cu
samples/globalToShmemAsyncCopy/CMakeLists.txt
samples/globalToShmemAsyncCopy/README.md
samples/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu
samples/libcuxxMdspan/CMakeLists.txt
samples/libcuxxMdspan/README.md
samples/libcuxxMdspan/libcuxxMdspan.cu
samples/libcuxxRandom/CMakeLists.txt
samples/libcuxxRandom/README.md
samples/libcuxxRandom/libcuxxRandom.cu
samples/lineOfSight/CMakeLists.txt
samples/lineOfSight/README.md
samples/lineOfSight/lineOfSight.cu
samples/python/Utilities/README.md
samples/python/Utilities/__init__.py
samples/python/Utilities/cuda_samples_utils.py
samples/python/cuda.compute/binarySearch/README.md
samples/python/cuda.compute/binarySearch/binarySearch.py
samples/python/cuda.compute/binarySearch/requirements.txt
samples/python/cuda.compute/cudaComputeLambdas/README.md
samples/python/cuda.compute/cudaComputeLambdas/cudaComputeLambdas.py
samples/python/cuda.compute/cudaComputeLambdas/requirements.txt
samples/python/cuda.compute/parallelReduction/README.md
samples/python/cuda.compute/parallelReduction/parallelReduction.py
samples/python/cuda.compute/parallelReduction/requirements.txt
samples/python/cuda.compute/prefixSum/README.md
samples/python/cuda.compute/prefixSum/prefixSum.py
samples/python/cuda.compute/prefixSum/requirements.txt
samples/python/requirements.txt
samples/python/test_samples.py
samples/radixSortThrust/CMakeLists.txt
samples/radixSortThrust/README.md
samples/radixSortThrust/doc/readme.txt
samples/radixSortThrust/radixSortThrust.cu
samples/run_tests.py
samples/segmentationTreeThrust/CMakeLists.txt
samples/segmentationTreeThrust/README.md
samples/segmentationTreeThrust/common.cuh
samples/segmentationTreeThrust/data/ref_00.ppm
samples/segmentationTreeThrust/data/ref_09.ppm
samples/segmentationTreeThrust/data/test.ppm
samples/segmentationTreeThrust/kernels.cuh
samples/segmentationTreeThrust/segmentationTree.cu
samples/simpleAWBarrier/CMakeLists.txt
samples/simpleAWBarrier/README.md
samples/simpleAWBarrier/simpleAWBarrier.cu
samples/test_args.json

coderabbitai · 2026-07-02T20:40:20Z

+set(CMAKE_C_FLAGS " \"-V${__qnx_gcc_ver},gcc_ntoaarch64le\"")
+set(CMAKE_CXX_FLAGS " \"-V${__qnx_gcc_ver},gcc_ntoaarch64le\"")
+set(CMAKE_CUDA_FLAGS " --qpp-config=${__qnx_gcc_ver},gcc_ntoaarch64le")
+set(
+  AUTOMAGIC_NVCC_FLAGS
+  --qpp-config=${__qnx_gcc_ver},gcc_ntoaarch64le
+  CACHE STRING
+  "automagic feature detection flags for cross build"
+)
+add_link_options("-V${__qnx_gcc_ver},gcc_ntoaarch64le")


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm __qnx_gcc_ver is defined somewhere the toolchain can see it rg -nP '__qnx_gcc_ver' -g '!**/toolchain-aarch64-qnx.cmake'

Repository: NVIDIA/cccl

Length of output: 149

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== toolchain file ==" nl -ba samples/cmake/toolchains/toolchain-aarch64-qnx.cmake | sed -n '1,220p' echo echo "== references to qnx toolchains / gcc_ver ==" rg -n "__qnx_gcc_ver|toolchain-aarch64-qnx|qnx" samples cmake .github ci docs -g '!**/build/**' -g '!**/dist/**' -g '!**/node_modules/**' || true

Repository: NVIDIA/cccl

Length of output: 210

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' from pathlib import Path p = Path('samples/cmake/toolchains/toolchain-aarch64-qnx.cmake') print(p.exists(), p.stat().st_size if p.exists() else 0) PY

Repository: NVIDIA/cccl

Length of output: 159

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== samples/cmake/toolchains/toolchain-aarch64-qnx.cmake ==" sed -n '1,220p' samples/cmake/toolchains/toolchain-aarch64-qnx.cmake echo echo "== similar toolchain patterns ==" rg -n "CACHE STRING|CACHE PATH|if\\(NOT DEFINED|message\\(FATAL_ERROR|set\\(__.*gcc_ver|qnx" samples/cmake/toolchains -g '*.cmake' || true

Repository: NVIDIA/cccl

Length of output: 2614

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== usages of toolchain-aarch64-qnx ==" rg -n "toolchain-aarch64-qnx|QNX_HOST|QNX_TARGET|__qnx_gcc_ver" . -g '!**/build/**' -g '!**/dist/**' -g '!**/node_modules/**' || true echo echo "== toolchain-aarch64-linux.cmake ==" sed -n '1,120p' samples/cmake/toolchains/toolchain-aarch64-linux.cmake echo echo "== docs mentioning QNX toolchain ==" rg -n "QNX|qnx" docs samples README.md .github ci -g '!**/build/**' -g '!**/dist/**' -g '!**/node_modules/**' || true

Repository: NVIDIA/cccl

Length of output: 8966

important: Define or require __qnx_gcc_ver in this toolchain; it is only referenced here, so if it is unset these flags become malformed (-V,gcc_ntoaarch64le, --qpp-config=,gcc_ntoaarch64le) and QNX builds will fail. Add a cache declaration with a default or a fatal guard.

coderabbitai · 2026-07-02T20:40:20Z

+inline bool getCmdLineArgumentString(const int argc, const char** argv, const char* string_ref, char** string_retval)
+{
+  bool bFound = false;
+
+  if (argc >= 1)
+  {
+    for (int i = 1; i < argc; i++)
+    {
+      int string_start  = stringRemoveDelimiter('-', argv[i]);
+      char* string_argv = const_cast<char*>(&argv[i][string_start]);
+      int length        = static_cast<int>(strlen(string_ref));
+
+      if (!STRNCASECMP(string_argv, string_ref, length))
+      {
+        *string_retval = &string_argv[length + 1];
+        bFound         = true;
+        continue;
+      }
+    }
+  }
+
+  if (!bFound)
+  {
+    *string_retval = NULL;
+  }
+
+  return bFound;
+}


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

important: out-of-bounds read when the matched flag has no =value suffix.

Unlike getCmdLineArgumentValue/getCmdLineArgumentInt/getCmdLineArgumentFloat in this same file, which check length + 1 <= strlen(string_argv) and detect the = before offsetting, getCmdLineArgumentString unconditionally does &string_argv[length + 1]. If the CLI flag is passed without =value (e.g. -file alone), string_argv[length] is the null terminator and string_argv[length + 1] reads one byte past the string buffer.

🐛 Proposed fix

if (!STRNCASECMP(string_argv, string_ref, length)) { - *string_retval = &string_argv[length + 1]; + int auto_inc = (static_cast<int>(strlen(string_argv)) > length && string_argv[length] == '=') ? 1 : 0; + *string_retval = &string_argv[length + auto_inc]; bFound = true; continue; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

inline bool getCmdLineArgumentString(const int argc, const char** argv, const char* string_ref, char** string_retval)

{

bool bFound = false;

if (argc >= 1)

{

for (int i = 1; i < argc; i++)

{

int string_start = stringRemoveDelimiter('-', argv[i]);

char* string_argv = const_cast<char*>(&argv[i][string_start]);

int length = static_cast<int>(strlen(string_ref));

if (!STRNCASECMP(string_argv, string_ref, length))

{

*string_retval = &string_argv[length + 1];

bFound = true;

continue;

}

}

}

if (!bFound)

{

*string_retval = NULL;

}

return bFound;

}

inline bool getCmdLineArgumentString(const int argc, const char** argv, const char* string_ref, char** string_retval)

{

bool bFound = false;

if (argc >= 1)

{

for (int i = 1; i < argc; i++)

{

int string_start = stringRemoveDelimiter('-', argv[i]);

char* string_argv = const_cast<char*>(&argv[i][string_start]);

int length = static_cast<int>(strlen(string_ref));

if (!STRNCASECMP(string_argv, string_ref, length))

{

int auto_inc = (static_cast<int>(strlen(string_argv)) > length && string_argv[length] == '=') ? 1 : 0;

*string_retval = &string_argv[length + auto_inc];

bFound = true;

continue;

}

}

}

if (!bFound)

{

*string_retval = NULL;

}

return bFound;

}

coderabbitai · 2026-07-02T20:40:20Z

+
+## Supported SM Architectures
+
+[SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus) [SM 10.0 ](https://developer.nvidia.com/cuda-gpus) [SM 11.0 ](https://developer.nvidia.com/cuda-gpus) [SM 12.0 ](https://developer.nvidia.com/cuda-gpus)


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

suggestion: SM 7.0 is listed as supported here, but samples/cubDeviceFind/CMakeLists.txt doesn't include architecture 70 in CMAKE_CUDA_ARCHITECTURES (only 75/80/86/87/89/90/100/110/120). Reconcile the two. Also each link label has a trailing space (SM 7.0 ) flagged by markdownlint MD039.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

Source: Linters/SAST tools

coderabbitai · 2026-07-02T20:40:20Z

+## Prerequisites
+
+Download and install the [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
+Make sure the dependencies mentioned in [Dependencies]() section above are installed.


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

suggestion: [Dependencies]() is an empty link (markdownlint MD042). Either remove the markdown link syntax or point it at the actual "Dependencies needed to build/run" anchor.

Proposed fix

-Make sure the dependencies mentioned in [Dependencies]() section above are installed. +Make sure the dependencies mentioned in the "Dependencies needed to build/run" section above are installed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Make sure the dependencies mentioned in [Dependencies]() section above are installed.

Make sure the dependencies mentioned in the "Dependencies needed to build/run" section above are installed.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 48-48: No empty links

(MD042, no-empty-links)

Source: Linters/SAST tools

coderabbitai · 2026-07-02T20:40:20Z

+set(
+  CMAKE_CUDA_ARCHITECTURES
+  75
+  80
+  86
+  87
+  89
+  90
+  100
+  110
+  120
+)


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does NVIDIA cuda-samples CMakeLists still include architecture 110 in its default CMAKE_CUDA_ARCHITECTURES list, and was this issue resolved?

💡 Result:

As of July 2, 2026, the NVIDIA cuda-samples repository continues to include 110 in its default CMAKE_CUDA_ARCHITECTURES list within its CMakeLists.txt files [1][2][3]. This has been an ongoing issue for users, as documented in multiple GitHub issues [4][5]. The inclusion of 110 (compute_110/sm_110) in this list often leads to build failures because the CUDA Toolkit versions used by many users do not support this specific architecture, causing the nvcc compiler to return an "Unsupported gpu architecture" fatal error [5]. The issue remains unresolved in the sense that the default configuration in the repository still explicitly sets 110 as a target [1][2][3]. Users encountering this problem typically resolve it locally by modifying the CMAKE_CUDA_ARCHITECTURES list in the relevant CMakeLists.txt files to remove 110 or by overriding the variable during the CMake configuration process [4][5].

Citations:

1: https://github.com/NVIDIA/cuda-samples/blob/master/CMakeLists.txt

2: https://github.com/NVIDIA/cuda-samples/blob/master/cpp/0_Introduction/template/CMakeLists.txt

3: https://github.com/NVIDIA/cuda-samples/blob/4f735616/Samples/0_Introduction/vectorAddMMAP/CMakeLists.txt

4: Unsupported gpu architecture 'compute_110' cuda-samples#400

5: nvcc fatal error: Unsupported gpu architecture 'compute_110' (but compute_120 is supported) cuda-samples#388

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the relevant file and surrounding sample build context. git ls-files samples/cubDeviceSegmentedScan/CMakeLists.txt wc -l samples/cubDeviceSegmentedScan/CMakeLists.txt cat -n samples/cubDeviceSegmentedScan/CMakeLists.txt | sed -n '1,120p' # Look for any toolkit/version gating or architecture overrides nearby. rg -n "CMAKE_CUDA_ARCHITECTURES|CUDA_TOOLKIT|CUDAToolkit|110|sm_110|compute_110" samples/cubDeviceSegmentedScan -S # Look for repo-wide guidance on supported CUDA versions/architectures in samples. rg -n "110|sm_110|compute_110|CUDA 13|CUDA 12\.8|CUDA 12\.9|CMAKE_CUDA_ARCHITECTURES" -S .github ci samples CMakeLists.txt docs | sed -n '1,200p'

Repository: NVIDIA/cccl

Length of output: 18740

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find the repository's stated CUDA toolkit support and any sample-specific version gating. rg -n "CUDA 12|CUDA 13|12\.8|12\.9|13\.0|CUDAToolkit .*REQUIRED|find_package\(CUDAToolkit|CMAKE_CUDA_ARCHITECTURES" -S \ CMakeLists.txt README.md docs .devcontainer ci samples | sed -n '1,240p' # Inspect the top-level CMake logic that mentions CMAKE_CUDA_ARCHITECTURES special handling. cat -n CMakeLists.txt | sed -n '1,120p'

Repository: NVIDIA/cccl

Length of output: 27118

important: CMAKE_CUDA_ARCHITECTURES should not unconditionally include 110 here. CCCL still supports CUDA 12.9, and compute_110 is rejected by nvcc before CUDA 13.0, so this sample fails to build on the older toolchain. Gate 110 on the toolkit version or remove it from the default list.

coderabbitai · 2026-07-02T20:40:20Z

+    unsigned int totalMem = (keysOnly ? 2 : 4) * numElements * sizeof(T);
+
+    if (devprop.totalGlobalMem < totalMem)
+    {
+      printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);
+      printf("%d bytes needed, %d bytes available\n", (int) totalMem, (int) devprop.totalGlobalMem);
+      exit(EXIT_SUCCESS);
+    }


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

suggestion: totalMem is unsigned int and computed as (keysOnly ? 2 : 4) * numElements * sizeof(T), which can overflow 32 bits for large -n values, silently defeating the memory-sufficiency check. The subsequent (int) casts of totalMem/devprop.totalGlobalMem in the printf will also print negative numbers on GPUs with more than 2GB of memory.

Proposed fix

- unsigned int totalMem = (keysOnly ? 2 : 4) * numElements * sizeof(T); + size_t totalMem = (keysOnly ? size_t{2} : size_t{4}) * static_cast<size_t>(numElements) * sizeof(T); if (devprop.totalGlobalMem < totalMem) { printf("Error: insufficient amount of memory to sort %d elements.\n", numElements); - printf("%d bytes needed, %d bytes available\n", (int) totalMem, (int) devprop.totalGlobalMem); + printf("%zu bytes needed, %zu bytes available\n", totalMem, devprop.totalGlobalMem); exit(EXIT_SUCCESS); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

unsigned int totalMem = (keysOnly ? 2 : 4) * numElements * sizeof(T);

if (devprop.totalGlobalMem < totalMem)

{

printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);

printf("%d bytes needed, %d bytes available\n", (int) totalMem, (int) devprop.totalGlobalMem);

exit(EXIT_SUCCESS);

}

size_t totalMem = (keysOnly ? size_t{2} : size_t{4}) * static_cast<size_t>(numElements) * sizeof(T);

if (devprop.totalGlobalMem < totalMem)

{

printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);

printf("%zu bytes needed, %zu bytes available\n", totalMem, devprop.totalGlobalMem);

exit(EXIT_SUCCESS);

}

coderabbitai · 2026-07-02T20:40:21Z

+int main(int argc, char** argv)
+{
+  // Start logs
+  printf("%s Starting...\n\n", argv[0]);
+
+  findCudaDevice(argc, (const char**) argv);
+
+  bool bTestResult = false;
+
+  if (checkCmdLineFlag(argc, (const char**) argv, "float"))
+  {
+    bTestResult = testSort<float, true>(argc, argv);
+  }
+  else
+  {
+    bTestResult = testSort<unsigned int, false>(argc, argv);
+  }
+
+  printf(bTestResult ? "Test passed\n" : "Test failed!\n");
+}


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail file='samples/radixSortThrust/radixSortThrust.cu' echo '--- file outline ---' ast-grep outline "$file" --view expanded || true echo echo '--- relevant ranges ---' sed -n '1,80p' "$file" echo sed -n '110,260p' "$file" echo echo '--- search for run_tests reference ---' rg -n "run_tests\.py|Test passed|Test failed|EXIT_SUCCESS|EXIT_FAILURE|return false;|totalMem" samples/radixSortThrust -n || true

Repository: NVIDIA/cccl

Length of output: 7538

important: main() prints the result but never returns it, so the process exits with success even when the sort fails. Add return bTestResult ? 0 : 1; at the end.

coderabbitai · 2026-07-02T20:40:21Z

+
+## Supported SM Architectures
+
+[SM 5.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 5.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 5.3 ](https://developer.nvidia.com/cuda-gpus)  [SM 6.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 6.1 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.5 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.6 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.7 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.9 ](https://developer.nvidia.com/cuda-gpus)  [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

important: Supported-architecture list doesn't match the build config. README lists SM 5.0–9.0, but samples/radixSortThrust/CMakeLists.txt only builds for SM 75/80/86/87/89/90/100/110/120. Also, markdownlint flags MD039 (spaces inside link text, e.g. [SM 5.0 ]) on this line.

Proposed fix

-[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus) +[SM 7.5](https://developer.nvidia.com/cuda-gpus) [SM 8.0](https://developer.nvidia.com/cuda-gpus) [SM 8.6](https://developer.nvidia.com/cuda-gpus) [SM 8.7](https://developer.nvidia.com/cuda-gpus) [SM 8.9](https://developer.nvidia.com/cuda-gpus) [SM 9.0](https://developer.nvidia.com/cuda-gpus) [SM 10.0](https://developer.nvidia.com/cuda-gpus) [SM 11.0](https://developer.nvidia.com/cuda-gpus) [SM 12.0](https://developer.nvidia.com/cuda-gpus)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)

[SM 7.5](https://developer.nvidia.com/cuda-gpus) [SM 8.0](https://developer.nvidia.com/cuda-gpus) [SM 8.6](https://developer.nvidia.com/cuda-gpus) [SM 8.7](https://developer.nvidia.com/cuda-gpus) [SM 8.9](https://developer.nvidia.com/cuda-gpus) [SM 9.0](https://developer.nvidia.com/cuda-gpus) [SM 10.0](https://developer.nvidia.com/cuda-gpus) [SM 11.0](https://developer.nvidia.com/cuda-gpus) [SM 12.0](https://developer.nvidia.com/cuda-gpus)

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

Source: Linters/SAST tools

coderabbitai · 2026-07-02T20:40:21Z

+        if os.access(path, os.X_OK):
+            # Skip if it's a library file
+            if path.suffix.lower() in (".dll", ".so", ".dylib"):
+                continue
+            executables.append(path)


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

important: Versioned shared libraries slip through this filter. path.suffix for libcudart.so.12 is .12, not .so, so it won't be skipped; .so files usually carry the exec bit and will be launched as "tests", producing spurious failures. The install tree copies shared libs (InstallSamples.cmake), so this is reachable in CI.

- # Check if file is executable - if os.access(path, os.X_OK): - # Skip if it's a library file - if path.suffix.lower() in (".dll", ".so", ".dylib"): - continue - executables.append(path) + # Check if file is executable + if os.access(path, os.X_OK): + # Skip shared/dynamic libraries, including versioned names like libfoo.so.1.2 + suffixes = {s.lower() for s in path.suffixes} + if suffixes & {".dll", ".so", ".dylib"}: + continue + executables.append(path)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if os.access(path, os.X_OK):

# Skip if it's a library file

if path.suffix.lower() in (".dll", ".so", ".dylib"):

continue

executables.append(path)

# Check if file is executable

if os.access(path, os.X_OK):

# Skip shared/dynamic libraries, including versioned names like libfoo.so.1.2

suffixes = {s.lower() for s in path.suffixes}

if suffixes & {".dll", ".so", ".dylib"}:

continue

executables.append(path)

coderabbitai · 2026-07-02T20:40:21Z

+    uint newVerticesCount;
+    cudaMemcpy(&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost);
+    ++newVerticesCount;


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

important: this cudaMemcpy is unchecked, unlike the other cudaMemcpy calls in this file (e.g. lines 256-263, 476-480) which use checkCudaErrors. If it silently fails, newVerticesCount stays uninitialized and directly drives the convergence branch below (ALGORITHM_FINISHED vs continuing), which can corrupt results or hang the loop. Several other CUDA calls in this file (cudaEventCreate, cudaMemGetInfo, cudaMemset, cudaEventElapsedTime) share this same unchecked pattern, but this one is the most consequential since it gates control flow.

🛠️ Suggested fix

uint newVerticesCount; - cudaMemcpy(&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost); + checkCudaErrors(cudaMemcpy( + &newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost)); ++newVerticesCount;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uint newVerticesCount;

cudaMemcpy(&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost);

++newVerticesCount;

uint newVerticesCount;

checkCudaErrors(cudaMemcpy(

&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost));

+newVerticesCount;

danielfrg added 2 commits July 2, 2026 13:28

Migrate samples from cuda-samples

9a687e4

Clean up non-CCCL samples

f0af087

danielfrg requested review from a team as code owners July 2, 2026 20:24

github-project-automation Bot added this to CCCL Jul 2, 2026

danielfrg requested review from gonidelis and wmaxey July 2, 2026 20:24

github-project-automation Bot moved this to Todo in CCCL Jul 2, 2026

danielfrg requested a review from kkraus14 July 2, 2026 20:24

danielfrg self-assigned this Jul 2, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jul 2, 2026

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes


		## Supported SM Architectures

		[SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus) [SM 10.0 ](https://developer.nvidia.com/cuda-gpus) [SM 11.0 ](https://developer.nvidia.com/cuda-gpus) [SM 12.0 ](https://developer.nvidia.com/cuda-gpus)

	Make sure the dependencies mentioned in [Dependencies]() section above are installed.
	Make sure the dependencies mentioned in the "Dependencies needed to build/run" section above are installed.


		## Supported SM Architectures

		[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)

Uh oh!

Conversation

danielfrg commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danielfrg commented Jul 2, 2026 •

edited

Loading