Skip to content

Add builds for MSVC cccl_c_parallel#9605

Open
miscco wants to merge 5 commits into
NVIDIA:mainfrom
miscco:hostjit_windows_ci
Open

Add builds for MSVC cccl_c_parallel#9605
miscco wants to merge 5 commits into
NVIDIA:mainfrom
miscco:hostjit_windows_ci

Conversation

@miscco

@miscco miscco commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

We want to test the hostjit CI also on windows

@miscco miscco requested a review from a team as a code owner June 26, 2026 08:43
@miscco miscco requested a review from jrhemstad June 26, 2026 08:43
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 26, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Windows CI support for cccl_c_parallel_v2: a new PowerShell build script, msvc added to the pull_request matrix, CATCH_CONFIG_NO_WINDOWS_SEH for the WIN32 target, and all four Clang ExecuteAction calls moved onto an 8MB llvm::thread worker.

Changes

Windows CI and large-stack compiler fix

Layer / File(s) Summary
CI matrix and build script
ci/matrix.yaml, ci/windows/build_cccl_c_parallel_v2.ps1
The pull_request matrix adds msvc for cccl_c_parallel_v2 on ctk: 13.X / gpu: rtx2080. The new PowerShell script accepts optional arch and cmake-options, imports build_common.psm1, and runs the cccl-c-parallel-v2 preset.
Windows build fix and large-stack ExecuteAction
c/parallel.v2/CMakeLists.txt, c/parallel.v2/src/hostjit/compiler.cpp
CATCH_CONFIG_NO_WINDOWS_SEH is added for WIN32. compiler.cpp adds kFrontendStackSize (8MB) and runWithLargeStack, then wraps ExecuteAction calls for PCH generation, device PTX, device bitcode, and host object compilation.

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5cb76448-9116-4e38-bcab-1142fc695938

📥 Commits

Reviewing files that changed from the base of the PR and between 8756dfc and 4870199.

📒 Files selected for processing (2)
  • ci/matrix.yaml
  • ci/windows/build_cccl_c_parallel_v2.ps1

Comment thread ci/matrix.yaml Outdated
Comment on lines +12 to +16
$CURRENT_PATH = Split-Path $pwd -leaf
If($CURRENT_PATH -ne "ci") {
Write-Host "Moving to ci folder"
pushd "$PSScriptRoot/.."
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

important: Line 13 only checks the leaf name of the current directory. If this script is launched from any unrelated directory named ci, it skips the repo-local pushd and the subsequent configure/build runs against the wrong tree. Compare against the full $PSScriptRoot/.. path, or unconditionally Push-Location to it before importing/building. As per path instructions, for ci/**/* focus on environment setup and clear failures.

Source: Path instructions

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@miscco miscco force-pushed the hostjit_windows_ci branch from bde4d22 to 67032a7 Compare June 30, 2026 13:52
@miscco miscco requested review from a team as code owners June 30, 2026 13:52

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f0af57e7-5c83-4b4a-80c3-41d68e9341b9

📥 Commits

Reviewing files that changed from the base of the PR and between 4870199 and 262598c.

📒 Files selected for processing (9)
  • c/parallel.v2/CMakeLists.txt
  • c/parallel.v2/src/hostjit/compiler.cpp
  • ci/matrix.yaml
  • ci/windows/build_cccl_c_parallel_v2.ps1
  • cub/cub/device/dispatch/kernels/kernel_scan_lookahead.cuh
  • cub/cub/device/dispatch/tuning/tuning_scan.cuh
  • cub/cub/iterator/tex_obj_input_iterator.cuh
  • cub/test/test_util.h
  • libcudacxx/include/cuda/std/__exception/terminate.h
💤 Files with no reviewable changes (1)
  • c/parallel.v2/src/hostjit/compiler.cpp
✅ Files skipped from review due to trivial changes (3)
  • cub/cub/device/dispatch/tuning/tuning_scan.cuh
  • cub/cub/device/dispatch/kernels/kernel_scan_lookahead.cuh
  • c/parallel.v2/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • ci/matrix.yaml
  • ci/windows/build_cccl_c_parallel_v2.ps1

Comment on lines +42 to +48
#if !_CCCL_HOSTJIT()
# if _CCCL_TILE_COMPILATION()
NV_IF_ELSE_TARGET(NV_IS_HOST, (::exit(-1);), (assert(false);))
#else // ^^^ _CCCL_TILE_COMPILATION() ^^^ / vvv !_CCCL_TILE_COMPILATION()
# else // ^^^ _CCCL_TILE_COMPILATION() ^^^ / vvv !_CCCL_TILE_COMPILATION()
NV_IF_ELSE_TARGET(NV_IS_HOST, (::exit(-1);), (::__trap();))
#endif // !_CCCL_TILE_COMPILATION()
# endif // !_CCCL_TILE_COMPILATION()
#endif // !_CCCL_HOSTJIT()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Dump the full __cccl_terminate definition to confirm what remains under _CCCL_HOSTJIT().
fd -t f 'terminate.h' libcudacxx/include/cuda/std/__exception \
  --exec sh -c 'echo "== {} =="; cat -n "{}"'

Repository: NVIDIA/cccl

Length of output: 3201


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n --hidden --glob '!**/.git/**' '\b#define\s+_CCCL_UNREACHABLE\b|_CCCL_UNREACHABLE\(' libcudacxx/include c 2>/dev/null || true

printf '\n== likely config headers ==\n'
fd -a '__config' libcudacxx/include/cuda/std/detail libcudacxx/include/cuda/std/__cccl c 2>/dev/null | head -n 20

Repository: NVIDIA/cccl

Length of output: 5443


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== libcudacxx/include/cuda/std/__cccl/unreachable.h ==\n'
cat -n libcudacxx/include/cuda/std/__cccl/unreachable.h

printf '\n== hostJIT-related macro definitions ==\n'
rg -n --hidden --glob '!**/.git/**' '_CCCL_HOSTJIT\(|NV_IF_ELSE_TARGET|NV_IS_HOST' libcudacxx/include/cuda/std/detail/__config libcudacxx/include/cuda/std/__cccl libcudacxx/include/cuda/std/__exception/terminate.h

Repository: NVIDIA/cccl

Length of output: 2277


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '210,245p' libcudacxx/include/cuda/std/__cccl/compiler.h

Repository: NVIDIA/cccl

Length of output: 1451


important: add a hostJIT terminating path in __cccl_terminate() Under _CCCL_HOSTJIT(), this #if removes every real halt path (::exit(-1), assert(false), ::__trap()), leaving only _CCCL_UNREACHABLE(). That macro is just an optimizer hint, so a call here can fall through the [[noreturn]] function contract. Add a hostJIT-specific abort/loop path before _CCCL_UNREACHABLE().

@github-actions

This comment has been minimized.

@miscco miscco force-pushed the hostjit_windows_ci branch from 2913c2e to 6c1f223 Compare July 1, 2026 06:43
@github-actions

This comment has been minimized.

miscco and others added 5 commits July 1, 2026 10:46
…dows stack overflow)

The host JIT invokes clang::CompilerInstance::ExecuteAction directly, which
bypasses the clang driver's runWithSufficientStackSpace() guard. On Windows the
default main-thread stack is 1 MB, so Clang's deep frontend recursion
(recursive-descent parsing / template instantiation) overflows the stack while
compiling heavier kernels such as radix_sort and segmented_reduce. On Linux the
8 MB default stack hides it.

Run each ExecuteAction on a worker thread whose stack matches Clang's own
DesiredStackSize (8 MB) -- the value already proven sufficient on Linux.
@miscco miscco force-pushed the hostjit_windows_ci branch from 6c1f223 to 1e302f5 Compare July 1, 2026 08:46
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 2h 43m: Pass: 97%/507 | Total: 19d 17h | Max: 2h 43m | Hits: 40%/1840760

See results here.

@Jacobfaib Jacobfaib left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ bits LGTM

// kernels such as radix_sort / segmented_reduce; Linux's 8 MB default hides it.
// Run the frontend on a worker thread sized to match clang's own
// DesiredStackSize (8 MB), which is the proven-sufficient value on Linux.
inline constexpr unsigned kFrontendStackSize = 8u << 20;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this function-local to runWithLargeStack, I don't see other uses of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

4 participants