Add builds for MSVC cccl_c_parallel#9605
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Windows CI support for ChangesWindows CI and large-stack compiler fix
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5cb76448-9116-4e38-bcab-1142fc695938
📒 Files selected for processing (2)
ci/matrix.yamlci/windows/build_cccl_c_parallel_v2.ps1
| $CURRENT_PATH = Split-Path $pwd -leaf | ||
| If($CURRENT_PATH -ne "ci") { | ||
| Write-Host "Moving to ci folder" | ||
| pushd "$PSScriptRoot/.." | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
important: Line 13 only checks the leaf name of the current directory. If this script is launched from any unrelated directory named ci, it skips the repo-local pushd and the subsequent configure/build runs against the wrong tree. Compare against the full $PSScriptRoot/.. path, or unconditionally Push-Location to it before importing/building. As per path instructions, for ci/**/* focus on environment setup and clear failures.
Source: Path instructions
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bde4d22 to
67032a7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f0af57e7-5c83-4b4a-80c3-41d68e9341b9
📒 Files selected for processing (9)
c/parallel.v2/CMakeLists.txtc/parallel.v2/src/hostjit/compiler.cppci/matrix.yamlci/windows/build_cccl_c_parallel_v2.ps1cub/cub/device/dispatch/kernels/kernel_scan_lookahead.cuhcub/cub/device/dispatch/tuning/tuning_scan.cuhcub/cub/iterator/tex_obj_input_iterator.cuhcub/test/test_util.hlibcudacxx/include/cuda/std/__exception/terminate.h
💤 Files with no reviewable changes (1)
- c/parallel.v2/src/hostjit/compiler.cpp
✅ Files skipped from review due to trivial changes (3)
- cub/cub/device/dispatch/tuning/tuning_scan.cuh
- cub/cub/device/dispatch/kernels/kernel_scan_lookahead.cuh
- c/parallel.v2/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (2)
- ci/matrix.yaml
- ci/windows/build_cccl_c_parallel_v2.ps1
| #if !_CCCL_HOSTJIT() | ||
| # if _CCCL_TILE_COMPILATION() | ||
| NV_IF_ELSE_TARGET(NV_IS_HOST, (::exit(-1);), (assert(false);)) | ||
| #else // ^^^ _CCCL_TILE_COMPILATION() ^^^ / vvv !_CCCL_TILE_COMPILATION() | ||
| # else // ^^^ _CCCL_TILE_COMPILATION() ^^^ / vvv !_CCCL_TILE_COMPILATION() | ||
| NV_IF_ELSE_TARGET(NV_IS_HOST, (::exit(-1);), (::__trap();)) | ||
| #endif // !_CCCL_TILE_COMPILATION() | ||
| # endif // !_CCCL_TILE_COMPILATION() | ||
| #endif // !_CCCL_HOSTJIT() |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Dump the full __cccl_terminate definition to confirm what remains under _CCCL_HOSTJIT().
fd -t f 'terminate.h' libcudacxx/include/cuda/std/__exception \
--exec sh -c 'echo "== {} =="; cat -n "{}"'Repository: NVIDIA/cccl
Length of output: 3201
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --hidden --glob '!**/.git/**' '\b#define\s+_CCCL_UNREACHABLE\b|_CCCL_UNREACHABLE\(' libcudacxx/include c 2>/dev/null || true
printf '\n== likely config headers ==\n'
fd -a '__config' libcudacxx/include/cuda/std/detail libcudacxx/include/cuda/std/__cccl c 2>/dev/null | head -n 20Repository: NVIDIA/cccl
Length of output: 5443
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== libcudacxx/include/cuda/std/__cccl/unreachable.h ==\n'
cat -n libcudacxx/include/cuda/std/__cccl/unreachable.h
printf '\n== hostJIT-related macro definitions ==\n'
rg -n --hidden --glob '!**/.git/**' '_CCCL_HOSTJIT\(|NV_IF_ELSE_TARGET|NV_IS_HOST' libcudacxx/include/cuda/std/detail/__config libcudacxx/include/cuda/std/__cccl libcudacxx/include/cuda/std/__exception/terminate.hRepository: NVIDIA/cccl
Length of output: 2277
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '210,245p' libcudacxx/include/cuda/std/__cccl/compiler.hRepository: NVIDIA/cccl
Length of output: 1451
important: add a hostJIT terminating path in __cccl_terminate() Under _CCCL_HOSTJIT(), this #if removes every real halt path (::exit(-1), assert(false), ::__trap()), leaving only _CCCL_UNREACHABLE(). That macro is just an optimizer hint, so a call here can fall through the [[noreturn]] function contract. Add a hostJIT-specific abort/loop path before _CCCL_UNREACHABLE().
e7d478f to
0bc7572
Compare
d6f52fd to
2913c2e
Compare
This comment has been minimized.
This comment has been minimized.
2913c2e to
6c1f223
Compare
This comment has been minimized.
This comment has been minimized.
…dows stack overflow) The host JIT invokes clang::CompilerInstance::ExecuteAction directly, which bypasses the clang driver's runWithSufficientStackSpace() guard. On Windows the default main-thread stack is 1 MB, so Clang's deep frontend recursion (recursive-descent parsing / template instantiation) overflows the stack while compiling heavier kernels such as radix_sort and segmented_reduce. On Linux the 8 MB default stack hides it. Run each ExecuteAction on a worker thread whose stack matches Clang's own DesiredStackSize (8 MB) -- the value already proven sufficient on Linux.
6c1f223 to
1e302f5
Compare
😬 CI Workflow Results🟥 Finished in 2h 43m: Pass: 97%/507 | Total: 19d 17h | Max: 2h 43m | Hits: 40%/1840760See results here. |
| // kernels such as radix_sort / segmented_reduce; Linux's 8 MB default hides it. | ||
| // Run the frontend on a worker thread sized to match clang's own | ||
| // DesiredStackSize (8 MB), which is the proven-sufficient value on Linux. | ||
| inline constexpr unsigned kFrontendStackSize = 8u << 20; |
There was a problem hiding this comment.
Make this function-local to runWithLargeStack, I don't see other uses of it.
We want to test the hostjit CI also on windows