Skip to content

Migrate samples from cuda-samples#9697

Open
danielfrg wants to merge 2 commits into
mainfrom
migrate-samples
Open

Migrate samples from cuda-samples#9697
danielfrg wants to merge 2 commits into
mainfrom
migrate-samples

Conversation

@danielfrg

@danielfrg danielfrg commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Description

closes

Migrating the existing CCCL (c++ and python) samples from cuda-samples to here.

These will be mirrored on each release to the cuda-samples repo so we need to keep them in the same format as they are in today.

Also added a CI run of these.

In general everything in /samples doesnt need to be reviewed right now. Those are simply migrated here and formatted. We can evaluate adding/removing/cleaning in the future.

Checklist

  • [x ] New or existing tests cover these changes.
  • [x ] The documentation is up to date with these changes.

@danielfrg danielfrg requested review from a team as code owners July 2, 2026 20:24
@danielfrg danielfrg requested review from gonidelis and wmaxey July 2, 2026 20:24
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jul 2, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@danielfrg danielfrg requested a review from kkraus14 July 2, 2026 20:24
@danielfrg danielfrg self-assigned this Jul 2, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added several new CUDA and Python samples covering device search, segmented scan, transform, tensor-core GEMM, async copy, mdspan, random distributions, line-of-sight, radix sort, segmentation trees, and barrier-based workflows.
    • Added shared sample utilities plus new sample build and test infrastructure, including broader CI coverage and cross-platform build support.
  • Documentation

    • Added README files for the new samples and Python utilities.
  • Bug Fixes

    • Improved file ignore and pre-commit handling for generated files and image assets.

Walkthrough

Adds a new top-level "samples" directory with a standalone CMake-based CCCL samples project, shared Common helper headers, CPM/install/toolchain CMake infrastructure, multiple CUDA/C++ and Python sample programs, CI matrix/script wiring for building and testing samples, and test-runner scripts (run_tests.py, test_samples.py). Minor repo config updates included.

Changes

CI and Infrastructure

Layer / File(s) Summary
Repo config
.codespell-ignore.txt, .gitignore, .pre-commit-config.yaml
Adds spellcheck ignore, ignores test-results/pytest cache directories, excludes .ppm files from pre-commit hooks.
CI matrix/project wiring
ci/matrix.yaml, ci/project_files_and_dependencies.yaml, ci/test_cuda_cccl_examples_python.sh
Adds samples project/job definitions across pull_request/lite/nightly matrices, dirty-file detection regexes, and conditional Python samples test execution.
Build/test scripts
ci/build_samples.sh, ci/test_samples.sh, ci/windows/build_samples.ps1
New Linux/Windows scripts to configure, build, install, and run the samples CMake project in CI.
Samples build infra
samples/CMakeLists.txt, samples/cmake/CPM.cmake, samples/cmake/InstallSamples.cmake, samples/cmake/Modules/*, samples/cmake/toolchains/*
Root project configuration, CPM dependency fetcher, shared install logic, find-modules (EGL/FreeImage/NVSCI), and aarch64 Linux/QNX cross-compile toolchains.

Common Helper Headers

Layer / File(s) Summary
Common headers
samples/Common/exception.h, helper_cuda.h, helper_functions.h, helper_image.h, helper_math.h, helper_string.h, helper_timer.h
New shared header-only utilities: exception wrapper, CUDA error checking/device selection, PPM/PGM I/O and comparisons, vector math, CLI/string parsing, and stopwatch timers used across samples.

CUDA/C++ Samples

Layer / File(s) Summary
CUB algorithm samples
samples/cubDeviceFind/*, samples/cubDeviceSegmentedScan/*, samples/cubDeviceTransform/*
New samples demonstrating cub::DeviceFind, DeviceSegmentedScan, and DeviceTransform with CMake build files and READMEs.
Advanced samples
samples/dmmaTensorCoreGemm/*, samples/globalToShmemAsyncCopy/*, samples/libcuxxMdspan/*, samples/libcuxxRandom/*, samples/lineOfSight/*, samples/radixSortThrust/*, samples/segmentationTreeThrust/*, samples/simpleAWBarrier/*
New samples covering tensor-core GEMM, async shared-memory copy, DLPack/mdspan interop, libcu++ random distributions, Thrust line-of-sight/radix sort, segmentation trees, and arrive-wait barrier reduction, each with CMakeLists and README.

Python Samples and Test Runners

Layer / File(s) Summary
Python Utilities
samples/python/Utilities/*, samples/python/requirements.txt
Shared package for CUDA requirement checks and result verification.
cuda.compute samples
samples/python/cuda.compute/binarySearch/*, .../cudaComputeLambdas/*, .../parallelReduction/*, .../prefixSum/*
New Python samples for binary search, lambda-based operators, parallel reduction, and prefix sum using cuda.compute/cuda.core.
Python test driver
samples/python/test_samples.py
Pytest driver discovering and running Python samples as subprocesses with GPU/import guards.
C++ test runner
samples/run_tests.py, samples/test_args.json
Discovers and runs built sample executables in parallel, classifies pass/waive/fail, and reports summary/exit code.

Sequence Diagram(s)

sequenceDiagram
  participant CI
  participant build_samples.sh
  participant CMake
  participant test_samples.sh
  participant run_tests.py
  CI->>build_samples.sh: invoke build
  build_samples.sh->>CMake: configure/build/install
  CI->>test_samples.sh: invoke test
  test_samples.sh->>run_tests.py: run installed executables
  run_tests.py-->>CI: pass/fail summary
Loading

Related Issues: None referenced in the provided data.

Related PRs: None referenced in the provided data.

Suggested labels: ci, samples, cmake, documentation

Suggested reviewers: None determinable from provided data.

important: samples/cmake/InstallSamples.cmake and samples/cmake/CPM.cmake are large, vendored-style CMake modules — verify they match upstream unmodified or document local changes, since divergence is hard to spot in review.

important: multiple sample CMakeLists fetch CCCL via CPM from GitHub pinned to v3.3.3 by default; confirm this tag stays in sync with the repo's actual CCCL version to avoid version drift between the in-repo library and the one built by samples.

suggestion: samples/test_args.json is currently an empty object — confirm this is intentional placeholder content and not a leftover from stubbing out test_samples.sh's config path.


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Nitpick comments (20)
samples/Common/helper_image.h (2)

224-224: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

suggestion: several malloc(sizeof(T) * n)-style allocations here (e.g. Line 224, Line 256, Line 279, Line 352, Line 369, Line 442) compute sizes without overflow checks, flagged by static analysis (CWE-190). Practical risk is low for typical sample image dimensions and this mirrors the original cuda-samples implementation, so treating as optional hardening rather than a blocker.

Also applies to: 256-256, 279-279, 352-352, 369-369, 442-442

Source: Linters/SAST tools


415-425: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

suggestion: sdkReadFile uses while (!feof(fh)) with an unchecked fscanf, relying on the trailing pop_back() to drop one bogus element. This is a known anti-pattern — if fscanf fails for a non-EOF reason (malformed token), a stale/garbage token value is retained and the compensating pop_back() doesn't guarantee correctness. Checking the fscanf return value directly would be more robust.

♻️ Proposed fix
-  while (!feof(fh))
-  {
-    fscanf(fh, "%f", &token);
-    data_read.push_back(token);
-  }
-
-  // the last element is read twice
-  data_read.pop_back();
-  fclose(fh);
+  while (fscanf(fh, "%f", &token) == 1)
+  {
+    data_read.push_back(token);
+  }
+
+  fclose(fh);
samples/Common/helper_string.h (1)

114-124: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

suggestion: getFileExtension can read out of bounds for an empty filename. filename[string_length--] accesses index 0 (the null terminator) then decrements to -1 before the string_length == 0 guard is checked, so the next iteration reads filename[-1]. Not currently exercised by the provided call sites, but worth a bounds check for robustness.

♻️ Proposed fix
 inline int getFileExtension(char* filename, char** extension)
 {
   int string_length = static_cast<int>(strlen(filename));
+
+  if (string_length == 0)
+  {
+    *extension = NULL;
+    return 0;
+  }
samples/cubDeviceFind/CMakeLists.txt (1)

16-27: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

suggestion: CMAKE_CUDA_ARCHITECTURES doesn't include SM 7.0 (70), but the README's "Supported SM Architectures" section lists SM 7.0 as supported. Either drop the SM 7.0 badge from the README or add 70 to this list.

samples/cubDeviceSegmentedScan/cubDeviceSegmentedScan.cu (1)

152-152: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Pass cuda::maximum<>{} directly as the binary op; the lambda wrapper adds no behavior.

samples/libcuxxRandom/README.md (1)

13-13: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: fix markdownlint findings — trailing spaces inside link text on line 13 ([SM 7.0 ][SM 7.0], repeated for each SM badge) and an empty link [Dependencies]() on line 40 (should anchor to the Dependencies section).

Proposed fix
-[SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus) [SM 10.0 ](https://developer.nvidia.com/cuda-gpus) [SM 11.0 ](https://developer.nvidia.com/cuda-gpus) [SM 12.0 ](https://developer.nvidia.com/cuda-gpus)
+[SM 7.0](https://developer.nvidia.com/cuda-gpus) [SM 7.5](https://developer.nvidia.com/cuda-gpus) [SM 8.0](https://developer.nvidia.com/cuda-gpus) [SM 8.6](https://developer.nvidia.com/cuda-gpus) [SM 8.9](https://developer.nvidia.com/cuda-gpus) [SM 9.0](https://developer.nvidia.com/cuda-gpus) [SM 10.0](https://developer.nvidia.com/cuda-gpus) [SM 11.0](https://developer.nvidia.com/cuda-gpus) [SM 12.0](https://developer.nvidia.com/cuda-gpus)
-Make sure the dependencies mentioned in [Dependencies]() section above are installed.
+Make sure the dependencies mentioned in the [Dependencies](`#dependencies-needed-to-buildrun`) section above are installed.

Also applies to: 40-40

Source: Linters/SAST tools

samples/libcuxxRandom/libcuxxRandom.cu (1)

123-138: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: --blocks uses a different CLI convention (--blocks <value>, space-separated) than --device=<value> used internally by findCudaDevice a few lines below. Consider using getCmdLineArgumentInt from helper_string.h for --blocks=N to stay consistent with the rest of the sample's CLI style.

samples/python/Utilities/__init__.py (1)

37-47: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: print_gpu_info is a public helper used by every sample (binarySearch, parallelReduction, prefixSum) but isn't re-exported here. Anyone importing via from Utilities import ... instead of the direct module-path pattern won't get it.

diff
 from .cuda_samples_utils import (
     check_cuda_requirements,
+    print_gpu_info,
     verify_array_result,
 )

 __version__ = "1.0.0"

 __all__ = [
     "check_cuda_requirements",
+    "print_gpu_info",
     "verify_array_result",
 ]
samples/python/cuda.compute/parallelReduction/README.md (1)

21-26: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Add language identifiers to both fenced code blocks.

Markdownlint is already flagging these as MD040, and tagging the fences will also improve syntax highlighting in rendered docs.

Also applies to: 74-76

Source: Linters/SAST tools

samples/run_tests.py (1)

331-342: 🚀 Performance & Scalability | 🔵 Trivial

suggestion: CI runs this with --parallel 4, but scheduling is GPU-agnostic. GPU-heavy samples (e.g. dmmaTensorCoreGemm, globalToShmemAsyncCopy) launched concurrently on a single device can OOM/contend and fail nondeterministically. Consider bounding parallelism by device memory, pinning runs to distinct devices, or marking heavy samples for serial execution in the config.

samples/test_args.json (1)

1-1: 📐 Maintainability & Code Quality | 🔵 Trivial

suggestion: Empty config means every discovered sample runs with no args, min_gpus=1, and a 300s timeout. Migrated cuda-samples that need specific CLI args, multiple GPUs, longer timeouts, or should be skipped/waived will silently fall back to defaults and may fail in CI. Want me to scaffold per-sample entries (args/runs/min_gpus/skip) for the samples in this PR?

ci/windows/build_samples.ps1 (1)

26-45: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: this script imports build_common.psm1 but never applies $script:GLOBAL_CMAKE_OPTIONS (which carries CCCL_ENABLE_WERROR/CCCL_ENABLE_PRAGMA_SYSTEM_HEADER in CI, per build_common.psm1) to the samples configure command — only the raw $CMAKE_OPTIONS parameter is appended. Since the comment at line 15-18 says presets are intentionally bypassed, this is likely by design, but it means Windows CI builds samples without warnings-as-errors enforcement that the rest of CCCL relies on for catching regressions.

ci/test_cuda_cccl_examples_python.sh (1)

31-37: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

suggestion: silently skipping the samples pytest run when test_samples.py is missing means a broken migration (e.g. file accidentally not shipped/renamed) fails silently instead of failing CI. Consider an explicit echo/warning when the file is absent, or making its presence mandatory if samples are always expected in-tree from this point on.

samples/globalToShmemAsyncCopy/CMakeLists.txt (1)

34-38: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

important: QNX exclusion here contradicts README.md line 17, which lists QNX as a supported OS. See the README comment for the cross-file fix.

samples/globalToShmemAsyncCopy/README.md (3)

13-13: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: strip the trailing spaces inside [SM X.X ] link text (flagged by markdownlint MD039), and reconcile the listed SM range with CMakeLists.txt, which only builds for SM 75+ (75 80 86 87 89 90 100 110 120). SM 7.0/7.2 in the README won't actually be compiled.

doc fix
-[SM 7.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.5 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.6 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.7 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.9 ](https://developer.nvidia.com/cuda-gpus)  [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
+[SM 7.5](https://developer.nvidia.com/cuda-gpus)  [SM 8.0](https://developer.nvidia.com/cuda-gpus)  [SM 8.6](https://developer.nvidia.com/cuda-gpus)  [SM 8.7](https://developer.nvidia.com/cuda-gpus)  [SM 8.9](https://developer.nvidia.com/cuda-gpus)  [SM 9.0](https://developer.nvidia.com/cuda-gpus)

Source: Linters/SAST tools


17-17: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

important: README lists QNX as a supported OS, but CMakeLists.txt (lines 35-38) explicitly refuses to build the sample on QNX. Update one side to match.


38-38: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: empty link [Dependencies]() (MD042). Point it at the "Dependencies needed to build/run" section above.

doc fix
-Make sure the dependencies mentioned in [Dependencies]() section above are installed.
+Make sure the dependencies mentioned in the [Dependencies](`#dependencies-needed-to-buildrun`) section above are installed.

Source: Linters/SAST tools

samples/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu (1)

50-52: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

suggestion: conditional #include <cuda/barrier> under __CUDA_ARCH__ >= 700 only takes effect during device-code compilation passes; since CMAKE_CUDA_ARCHITECTURES in this sample's CMakeLists.txt starts at 75, this resolves correctly today, but it's a fragile pattern if the minimum architecture is ever lowered. No action needed given current build config.

samples/radixSortThrust/radixSortThrust.cu (2)

227-233: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: This conditional is a no-op — both branches return bTestResult's value (the early return false only triggers when bTestResult is already false). Safe to simplify.

Proposed fix
-  if (!bTestResult && !quiet)
-  {
-    return false;
-  }
-
   return bTestResult;
 }

29-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

suggestion: <thrust/detail/type_traits.h> is a Thrust internal implementation header (not part of the public API) and no symbols from it appear used in this file. Consider dropping it, or replacing with the specific public header if something from it is actually needed.

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1ed8f7fb-6e50-4632-be65-99695c3b3df1

📥 Commits

Reviewing files that changed from the base of the PR and between a1fb0b8 and f0af087.

📒 Files selected for processing (82)
  • .codespell-ignore.txt
  • .gitignore
  • .pre-commit-config.yaml
  • ci/build_samples.sh
  • ci/matrix.yaml
  • ci/project_files_and_dependencies.yaml
  • ci/test_cuda_cccl_examples_python.sh
  • ci/test_samples.sh
  • ci/windows/build_samples.ps1
  • samples/CMakeLists.txt
  • samples/Common/exception.h
  • samples/Common/helper_cuda.h
  • samples/Common/helper_functions.h
  • samples/Common/helper_image.h
  • samples/Common/helper_math.h
  • samples/Common/helper_string.h
  • samples/Common/helper_timer.h
  • samples/cmake/CPM.cmake
  • samples/cmake/InstallSamples.cmake
  • samples/cmake/Modules/FindEGL.cmake
  • samples/cmake/Modules/FindFreeImage.cmake
  • samples/cmake/Modules/FindNVSCI.cmake
  • samples/cmake/toolchains/toolchain-aarch64-linux.cmake
  • samples/cmake/toolchains/toolchain-aarch64-qnx.cmake
  • samples/cubDeviceFind/CMakeLists.txt
  • samples/cubDeviceFind/README.md
  • samples/cubDeviceFind/cubDeviceFind.cu
  • samples/cubDeviceSegmentedScan/CMakeLists.txt
  • samples/cubDeviceSegmentedScan/README.md
  • samples/cubDeviceSegmentedScan/cubDeviceSegmentedScan.cu
  • samples/cubDeviceTransform/CMakeLists.txt
  • samples/cubDeviceTransform/README.md
  • samples/cubDeviceTransform/cubDeviceTransform.cu
  • samples/dmmaTensorCoreGemm/CMakeLists.txt
  • samples/dmmaTensorCoreGemm/README.md
  • samples/dmmaTensorCoreGemm/dmmaTensorCoreGemm.cu
  • samples/globalToShmemAsyncCopy/CMakeLists.txt
  • samples/globalToShmemAsyncCopy/README.md
  • samples/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu
  • samples/libcuxxMdspan/CMakeLists.txt
  • samples/libcuxxMdspan/README.md
  • samples/libcuxxMdspan/libcuxxMdspan.cu
  • samples/libcuxxRandom/CMakeLists.txt
  • samples/libcuxxRandom/README.md
  • samples/libcuxxRandom/libcuxxRandom.cu
  • samples/lineOfSight/CMakeLists.txt
  • samples/lineOfSight/README.md
  • samples/lineOfSight/lineOfSight.cu
  • samples/python/Utilities/README.md
  • samples/python/Utilities/__init__.py
  • samples/python/Utilities/cuda_samples_utils.py
  • samples/python/cuda.compute/binarySearch/README.md
  • samples/python/cuda.compute/binarySearch/binarySearch.py
  • samples/python/cuda.compute/binarySearch/requirements.txt
  • samples/python/cuda.compute/cudaComputeLambdas/README.md
  • samples/python/cuda.compute/cudaComputeLambdas/cudaComputeLambdas.py
  • samples/python/cuda.compute/cudaComputeLambdas/requirements.txt
  • samples/python/cuda.compute/parallelReduction/README.md
  • samples/python/cuda.compute/parallelReduction/parallelReduction.py
  • samples/python/cuda.compute/parallelReduction/requirements.txt
  • samples/python/cuda.compute/prefixSum/README.md
  • samples/python/cuda.compute/prefixSum/prefixSum.py
  • samples/python/cuda.compute/prefixSum/requirements.txt
  • samples/python/requirements.txt
  • samples/python/test_samples.py
  • samples/radixSortThrust/CMakeLists.txt
  • samples/radixSortThrust/README.md
  • samples/radixSortThrust/doc/readme.txt
  • samples/radixSortThrust/radixSortThrust.cu
  • samples/run_tests.py
  • samples/segmentationTreeThrust/CMakeLists.txt
  • samples/segmentationTreeThrust/README.md
  • samples/segmentationTreeThrust/common.cuh
  • samples/segmentationTreeThrust/data/ref_00.ppm
  • samples/segmentationTreeThrust/data/ref_09.ppm
  • samples/segmentationTreeThrust/data/test.ppm
  • samples/segmentationTreeThrust/kernels.cuh
  • samples/segmentationTreeThrust/segmentationTree.cu
  • samples/simpleAWBarrier/CMakeLists.txt
  • samples/simpleAWBarrier/README.md
  • samples/simpleAWBarrier/simpleAWBarrier.cu
  • samples/test_args.json

Comment on lines +35 to +44
set(CMAKE_C_FLAGS " \"-V${__qnx_gcc_ver},gcc_ntoaarch64le\"")
set(CMAKE_CXX_FLAGS " \"-V${__qnx_gcc_ver},gcc_ntoaarch64le\"")
set(CMAKE_CUDA_FLAGS " --qpp-config=${__qnx_gcc_ver},gcc_ntoaarch64le")
set(
AUTOMAGIC_NVCC_FLAGS
--qpp-config=${__qnx_gcc_ver},gcc_ntoaarch64le
CACHE STRING
"automagic feature detection flags for cross build"
)
add_link_options("-V${__qnx_gcc_ver},gcc_ntoaarch64le")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm __qnx_gcc_ver is defined somewhere the toolchain can see it
rg -nP '__qnx_gcc_ver' -g '!**/toolchain-aarch64-qnx.cmake'

Repository: NVIDIA/cccl

Length of output: 149


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== toolchain file =="
nl -ba samples/cmake/toolchains/toolchain-aarch64-qnx.cmake | sed -n '1,220p'

echo
echo "== references to qnx toolchains / gcc_ver =="
rg -n "__qnx_gcc_ver|toolchain-aarch64-qnx|qnx" samples cmake .github ci docs -g '!**/build/**' -g '!**/dist/**' -g '!**/node_modules/**' || true

Repository: NVIDIA/cccl

Length of output: 210


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
p = Path('samples/cmake/toolchains/toolchain-aarch64-qnx.cmake')
print(p.exists(), p.stat().st_size if p.exists() else 0)
PY

Repository: NVIDIA/cccl

Length of output: 159


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== samples/cmake/toolchains/toolchain-aarch64-qnx.cmake =="
sed -n '1,220p' samples/cmake/toolchains/toolchain-aarch64-qnx.cmake

echo
echo "== similar toolchain patterns =="
rg -n "CACHE STRING|CACHE PATH|if\\(NOT DEFINED|message\\(FATAL_ERROR|set\\(__.*gcc_ver|qnx" samples/cmake/toolchains -g '*.cmake' || true

Repository: NVIDIA/cccl

Length of output: 2614


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== usages of toolchain-aarch64-qnx =="
rg -n "toolchain-aarch64-qnx|QNX_HOST|QNX_TARGET|__qnx_gcc_ver" . -g '!**/build/**' -g '!**/dist/**' -g '!**/node_modules/**' || true

echo
echo "== toolchain-aarch64-linux.cmake =="
sed -n '1,120p' samples/cmake/toolchains/toolchain-aarch64-linux.cmake

echo
echo "== docs mentioning QNX toolchain =="
rg -n "QNX|qnx" docs samples README.md .github ci -g '!**/build/**' -g '!**/dist/**' -g '!**/node_modules/**' || true

Repository: NVIDIA/cccl

Length of output: 8966


important: Define or require __qnx_gcc_ver in this toolchain; it is only referenced here, so if it is unset these flags become malformed (-V,gcc_ntoaarch64le, --qpp-config=,gcc_ntoaarch64le) and QNX builds will fail. Add a cache declaration with a default or a fatal guard.

Comment on lines +283 to +310
inline bool getCmdLineArgumentString(const int argc, const char** argv, const char* string_ref, char** string_retval)
{
bool bFound = false;

if (argc >= 1)
{
for (int i = 1; i < argc; i++)
{
int string_start = stringRemoveDelimiter('-', argv[i]);
char* string_argv = const_cast<char*>(&argv[i][string_start]);
int length = static_cast<int>(strlen(string_ref));

if (!STRNCASECMP(string_argv, string_ref, length))
{
*string_retval = &string_argv[length + 1];
bFound = true;
continue;
}
}
}

if (!bFound)
{
*string_retval = NULL;
}

return bFound;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

important: out-of-bounds read when the matched flag has no =value suffix.

Unlike getCmdLineArgumentValue/getCmdLineArgumentInt/getCmdLineArgumentFloat in this same file, which check length + 1 <= strlen(string_argv) and detect the = before offsetting, getCmdLineArgumentString unconditionally does &string_argv[length + 1]. If the CLI flag is passed without =value (e.g. -file alone), string_argv[length] is the null terminator and string_argv[length + 1] reads one byte past the string buffer.

🐛 Proposed fix
       if (!STRNCASECMP(string_argv, string_ref, length))
       {
-        *string_retval = &string_argv[length + 1];
+        int auto_inc = (static_cast<int>(strlen(string_argv)) > length && string_argv[length] == '=') ? 1 : 0;
+        *string_retval = &string_argv[length + auto_inc];
         bFound         = true;
         continue;
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inline bool getCmdLineArgumentString(const int argc, const char** argv, const char* string_ref, char** string_retval)
{
bool bFound = false;
if (argc >= 1)
{
for (int i = 1; i < argc; i++)
{
int string_start = stringRemoveDelimiter('-', argv[i]);
char* string_argv = const_cast<char*>(&argv[i][string_start]);
int length = static_cast<int>(strlen(string_ref));
if (!STRNCASECMP(string_argv, string_ref, length))
{
*string_retval = &string_argv[length + 1];
bFound = true;
continue;
}
}
}
if (!bFound)
{
*string_retval = NULL;
}
return bFound;
}
inline bool getCmdLineArgumentString(const int argc, const char** argv, const char* string_ref, char** string_retval)
{
bool bFound = false;
if (argc >= 1)
{
for (int i = 1; i < argc; i++)
{
int string_start = stringRemoveDelimiter('-', argv[i]);
char* string_argv = const_cast<char*>(&argv[i][string_start]);
int length = static_cast<int>(strlen(string_ref));
if (!STRNCASECMP(string_argv, string_ref, length))
{
int auto_inc = (static_cast<int>(strlen(string_argv)) > length && string_argv[length] == '=') ? 1 : 0;
*string_retval = &string_argv[length + auto_inc];
bFound = true;
continue;
}
}
}
if (!bFound)
{
*string_retval = NULL;
}
return bFound;
}


## Supported SM Architectures

[SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus) [SM 10.0 ](https://developer.nvidia.com/cuda-gpus) [SM 11.0 ](https://developer.nvidia.com/cuda-gpus) [SM 12.0 ](https://developer.nvidia.com/cuda-gpus)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

suggestion: SM 7.0 is listed as supported here, but samples/cubDeviceFind/CMakeLists.txt doesn't include architecture 70 in CMAKE_CUDA_ARCHITECTURES (only 75/80/86/87/89/90/100/110/120). Reconcile the two. Also each link label has a trailing space (SM 7.0 ) flagged by markdownlint MD039.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

Source: Linters/SAST tools

## Prerequisites

Download and install the [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

suggestion: [Dependencies]() is an empty link (markdownlint MD042). Either remove the markdown link syntax or point it at the actual "Dependencies needed to build/run" anchor.

Proposed fix
-Make sure the dependencies mentioned in [Dependencies]() section above are installed.
+Make sure the dependencies mentioned in the "Dependencies needed to build/run" section above are installed.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Make sure the dependencies mentioned in [Dependencies]() section above are installed.
Make sure the dependencies mentioned in the "Dependencies needed to build/run" section above are installed.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 48-48: No empty links

(MD042, no-empty-links)

Source: Linters/SAST tools

Comment on lines +16 to +27
set(
CMAKE_CUDA_ARCHITECTURES
75
80
86
87
89
90
100
110
120
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does NVIDIA cuda-samples CMakeLists still include architecture 110 in its default CMAKE_CUDA_ARCHITECTURES list, and was this issue resolved?

💡 Result:

As of July 2, 2026, the NVIDIA cuda-samples repository continues to include 110 in its default CMAKE_CUDA_ARCHITECTURES list within its CMakeLists.txt files [1][2][3]. This has been an ongoing issue for users, as documented in multiple GitHub issues [4][5]. The inclusion of 110 (compute_110/sm_110) in this list often leads to build failures because the CUDA Toolkit versions used by many users do not support this specific architecture, causing the nvcc compiler to return an "Unsupported gpu architecture" fatal error [5]. The issue remains unresolved in the sense that the default configuration in the repository still explicitly sets 110 as a target [1][2][3]. Users encountering this problem typically resolve it locally by modifying the CMAKE_CUDA_ARCHITECTURES list in the relevant CMakeLists.txt files to remove 110 or by overriding the variable during the CMake configuration process [4][5].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant file and surrounding sample build context.
git ls-files samples/cubDeviceSegmentedScan/CMakeLists.txt
wc -l samples/cubDeviceSegmentedScan/CMakeLists.txt
cat -n samples/cubDeviceSegmentedScan/CMakeLists.txt | sed -n '1,120p'

# Look for any toolkit/version gating or architecture overrides nearby.
rg -n "CMAKE_CUDA_ARCHITECTURES|CUDA_TOOLKIT|CUDAToolkit|110|sm_110|compute_110" samples/cubDeviceSegmentedScan -S

# Look for repo-wide guidance on supported CUDA versions/architectures in samples.
rg -n "110|sm_110|compute_110|CUDA 13|CUDA 12\.8|CUDA 12\.9|CMAKE_CUDA_ARCHITECTURES" -S .github ci samples CMakeLists.txt docs | sed -n '1,200p'

Repository: NVIDIA/cccl

Length of output: 18740


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the repository's stated CUDA toolkit support and any sample-specific version gating.
rg -n "CUDA 12|CUDA 13|12\.8|12\.9|13\.0|CUDAToolkit .*REQUIRED|find_package\(CUDAToolkit|CMAKE_CUDA_ARCHITECTURES" -S \
  CMakeLists.txt README.md docs .devcontainer ci samples | sed -n '1,240p'

# Inspect the top-level CMake logic that mentions CMAKE_CUDA_ARCHITECTURES special handling.
cat -n CMakeLists.txt | sed -n '1,120p'

Repository: NVIDIA/cccl

Length of output: 27118


important: CMAKE_CUDA_ARCHITECTURES should not unconditionally include 110 here. CCCL still supports CUDA 12.9, and compute_110 is rejected by nvcc before CUDA 13.0, so this sample fails to build on the older toolchain. Gate 110 on the toolkit version or remove it from the default list.

Comment on lines +116 to +123
unsigned int totalMem = (keysOnly ? 2 : 4) * numElements * sizeof(T);

if (devprop.totalGlobalMem < totalMem)
{
printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);
printf("%d bytes needed, %d bytes available\n", (int) totalMem, (int) devprop.totalGlobalMem);
exit(EXIT_SUCCESS);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

suggestion: totalMem is unsigned int and computed as (keysOnly ? 2 : 4) * numElements * sizeof(T), which can overflow 32 bits for large -n values, silently defeating the memory-sufficiency check. The subsequent (int) casts of totalMem/devprop.totalGlobalMem in the printf will also print negative numbers on GPUs with more than 2GB of memory.

Proposed fix
-    unsigned int totalMem = (keysOnly ? 2 : 4) * numElements * sizeof(T);
+    size_t totalMem = (keysOnly ? size_t{2} : size_t{4}) * static_cast<size_t>(numElements) * sizeof(T);
 
     if (devprop.totalGlobalMem < totalMem)
     {
       printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);
-      printf("%d bytes needed, %d bytes available\n", (int) totalMem, (int) devprop.totalGlobalMem);
+      printf("%zu bytes needed, %zu bytes available\n", totalMem, devprop.totalGlobalMem);
       exit(EXIT_SUCCESS);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
unsigned int totalMem = (keysOnly ? 2 : 4) * numElements * sizeof(T);
if (devprop.totalGlobalMem < totalMem)
{
printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);
printf("%d bytes needed, %d bytes available\n", (int) totalMem, (int) devprop.totalGlobalMem);
exit(EXIT_SUCCESS);
}
size_t totalMem = (keysOnly ? size_t{2} : size_t{4}) * static_cast<size_t>(numElements) * sizeof(T);
if (devprop.totalGlobalMem < totalMem)
{
printf("Error: insufficient amount of memory to sort %d elements.\n", numElements);
printf("%zu bytes needed, %zu bytes available\n", totalMem, devprop.totalGlobalMem);
exit(EXIT_SUCCESS);
}

Comment on lines +235 to +254
int main(int argc, char** argv)
{
// Start logs
printf("%s Starting...\n\n", argv[0]);

findCudaDevice(argc, (const char**) argv);

bool bTestResult = false;

if (checkCmdLineFlag(argc, (const char**) argv, "float"))
{
bTestResult = testSort<float, true>(argc, argv);
}
else
{
bTestResult = testSort<unsigned int, false>(argc, argv);
}

printf(bTestResult ? "Test passed\n" : "Test failed!\n");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file='samples/radixSortThrust/radixSortThrust.cu'

echo '--- file outline ---'
ast-grep outline "$file" --view expanded || true

echo
echo '--- relevant ranges ---'
sed -n '1,80p' "$file"
echo
sed -n '110,260p' "$file"

echo
echo '--- search for run_tests reference ---'
rg -n "run_tests\.py|Test passed|Test failed|EXIT_SUCCESS|EXIT_FAILURE|return false;|totalMem" samples/radixSortThrust -n || true

Repository: NVIDIA/cccl

Length of output: 7538


important: main() prints the result but never returns it, so the process exits with success even when the sort fails. Add return bTestResult ? 0 : 1; at the end.


## Supported SM Architectures

[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

important: Supported-architecture list doesn't match the build config. README lists SM 5.0–9.0, but samples/radixSortThrust/CMakeLists.txt only builds for SM 75/80/86/87/89/90/100/110/120. Also, markdownlint flags MD039 (spaces inside link text, e.g. [SM 5.0 ]) on this line.

Proposed fix
-[SM 5.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 5.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 5.3 ](https://developer.nvidia.com/cuda-gpus)  [SM 6.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 6.1 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.5 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.6 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.7 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.9 ](https://developer.nvidia.com/cuda-gpus)  [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
+[SM 7.5](https://developer.nvidia.com/cuda-gpus)  [SM 8.0](https://developer.nvidia.com/cuda-gpus)  [SM 8.6](https://developer.nvidia.com/cuda-gpus)  [SM 8.7](https://developer.nvidia.com/cuda-gpus)  [SM 8.9](https://developer.nvidia.com/cuda-gpus)  [SM 9.0](https://developer.nvidia.com/cuda-gpus)  [SM 10.0](https://developer.nvidia.com/cuda-gpus)  [SM 11.0](https://developer.nvidia.com/cuda-gpus)  [SM 12.0](https://developer.nvidia.com/cuda-gpus)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
[SM 7.5](https://developer.nvidia.com/cuda-gpus) [SM 8.0](https://developer.nvidia.com/cuda-gpus) [SM 8.6](https://developer.nvidia.com/cuda-gpus) [SM 8.7](https://developer.nvidia.com/cuda-gpus) [SM 8.9](https://developer.nvidia.com/cuda-gpus) [SM 9.0](https://developer.nvidia.com/cuda-gpus) [SM 10.0](https://developer.nvidia.com/cuda-gpus) [SM 11.0](https://developer.nvidia.com/cuda-gpus) [SM 12.0](https://developer.nvidia.com/cuda-gpus)
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)


[warning] 13-13: Spaces inside link text

(MD039, no-space-in-links)

Source: Linters/SAST tools

Comment thread samples/run_tests.py
Comment on lines +89 to +93
if os.access(path, os.X_OK):
# Skip if it's a library file
if path.suffix.lower() in (".dll", ".so", ".dylib"):
continue
executables.append(path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

important: Versioned shared libraries slip through this filter. path.suffix for libcudart.so.12 is .12, not .so, so it won't be skipped; .so files usually carry the exec bit and will be launched as "tests", producing spurious failures. The install tree copies shared libs (InstallSamples.cmake), so this is reachable in CI.

-        # Check if file is executable
-        if os.access(path, os.X_OK):
-            # Skip if it's a library file
-            if path.suffix.lower() in (".dll", ".so", ".dylib"):
-                continue
-            executables.append(path)
+        # Check if file is executable
+        if os.access(path, os.X_OK):
+            # Skip shared/dynamic libraries, including versioned names like libfoo.so.1.2
+            suffixes = {s.lower() for s in path.suffixes}
+            if suffixes & {".dll", ".so", ".dylib"}:
+                continue
+            executables.append(path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if os.access(path, os.X_OK):
# Skip if it's a library file
if path.suffix.lower() in (".dll", ".so", ".dylib"):
continue
executables.append(path)
# Check if file is executable
if os.access(path, os.X_OK):
# Skip shared/dynamic libraries, including versioned names like libfoo.so.1.2
suffixes = {s.lower() for s in path.suffixes}
if suffixes & {".dll", ".so", ".dylib"}:
continue
executables.append(path)

Comment on lines +615 to +617
uint newVerticesCount;
cudaMemcpy(&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost);
++newVerticesCount;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

important: this cudaMemcpy is unchecked, unlike the other cudaMemcpy calls in this file (e.g. lines 256-263, 476-480) which use checkCudaErrors. If it silently fails, newVerticesCount stays uninitialized and directly drives the convergence branch below (ALGORITHM_FINISHED vs continuing), which can corrupt results or hang the loop. Several other CUDA calls in this file (cudaEventCreate, cudaMemGetInfo, cudaMemset, cudaEventElapsedTime) share this same unchecked pattern, but this one is the most consequential since it gates control flow.

🛠️ Suggested fix
     uint newVerticesCount;
-    cudaMemcpy(&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost);
+    checkCudaErrors(cudaMemcpy(
+      &newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost));
     ++newVerticesCount;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uint newVerticesCount;
cudaMemcpy(&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost);
++newVerticesCount;
uint newVerticesCount;
checkCudaErrors(cudaMemcpy(
&newVerticesCount, (dNewVerticesIDs_ + verticesCount_ - 1).get(), sizeof(uint), cudaMemcpyDeviceToHost));
+newVerticesCount;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant