[cudax] Implement `cudax::coop::any_of` algorithm for <= warp groups by davebayer · Pull Request #9665 · NVIDIA/cccl

davebayer · 2026-07-01T12:45:38Z

coderabbitai · 2026-07-01T12:49:36Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added coop::any support for cooperative groups, including warp-level and broadcasted variants.
- New behavior returns a result for the group root and supports direct boolean output in broadcasted mode.
Bug Fixes
- Added safeguards to prevent any from being used with non-boolean thread arrays.
Tests
- Added CUDA test coverage for thread, warp, and within-warp group scenarios across multiple input patterns.

Walkthrough

Adds a new cuda::experimental::coop::any header implementing OR-reduction across per-thread boolean arrays and warp-level groups, with regular and broadcasted overloads. Wires it into the coop.cuh umbrella header and adds three new CMake-registered CUDA test executables covering this_thread, this_warp, and threads_within_warp scenarios.

Changes

coop::any feature

Layer / File(s)	Summary
any.cuh core implementation `cudax/include/cuda/experimental/__coop/any.cuh`	Implements internal `__any_impl` for per-thread boolean arrays (OR-reduce) and warp-level groups (lane-mask combine via `__any_sync`), plus public `any` overloads returning `optional<bool>` or `bool` (broadcasted), with non-`bool` overloads deleted.
Umbrella header wiring and test registration `cudax/include/cuda/experimental/coop.cuh`, `cudax/test/CMakeLists.txt`	Includes the new header from `coop.cuh` and registers three new Catch2 test executables for `coop.any.this_thread`, `coop.any.this_warp`, `coop.any.threads_within_warp`.
this_thread any test `cudax/test/coop/any/this_thread.cu`	Adds `test_group`, `CustomBinaryPartition`, `TestKernel`, and a `C2H_TEST` validating `any` over per-thread boolean arrays for all-false, all-true, and single-difference cases.
this_warp any test `cudax/test/coop/any/this_warp.cu`	Adds `test_group`, `CustomBinaryPartition`, `TestKernel`, and a `C2H_TEST` validating `any` and broadcasted `any` across warp lanes with rank-based patterns.
threads_within_warp any test `cudax/test/coop/any/threads_within_warp.cu`	Adds `test_group` filtering non-member threads, `CustomBinaryPartition`, `TestKernel` exercising `group_by` and `binary_partition` configurations, and a `C2H_TEST` harness.

Suggested reviewers: griwes, jrhemstad, fbusato

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

cudax/include/cuda/experimental/__coop/any.cuh (1)

49-50: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Accept const bool arrays for __thread_data.

The algorithm only reads the array, but every overload takes _Tp (&)[_Np] / bool (&)[_Np], so const bool[N] inputs cannot call coop::any. Consider changing the internal, public, and deleted overloads to take const bool (&)[_Np] / const _Tp (&)[_Np]. As per path instructions, focus on experimental API clarity. As per coding guidelines, all variables that are not modified must be declared const.

Also applies to: 70-70, 89-90, 97-98, 105-109

Sources: Coding guidelines, Path instructions

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 15730b6b-01fe-4ca2-85ab-918364345690

📥 Commits

Reviewing files that changed from the base of the PR and between 96e4c40 and 3c30139.

📒 Files selected for processing (6)

cudax/include/cuda/experimental/__coop/any.cuh
cudax/include/cuda/experimental/coop.cuh
cudax/test/CMakeLists.txt
cudax/test/coop/any/this_thread.cu
cudax/test/coop/any/this_warp.cu
cudax/test/coop/any/threads_within_warp.cu

coderabbitai · 2026-07-01T14:58:44Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{}

github-actions · 2026-07-01T16:03:26Z

🥳 CI Workflow Results

🟩 Finished in 1h 09m: Pass: 100%/55 | Total: 9h 49m | Max: 1h 09m | Hits: 79%/43333

See results here.

pciolkosz

Do we need a variant that just takes a single bool instead of an array?

pciolkosz · 2026-07-01T22:59:39Z

+  }
+}
+
+struct CustomBinaryPartition


I believe this is not needed in this file

pciolkosz · 2026-07-01T22:59:48Z

+  }
+}
+
+struct CustomBinaryPartition


pciolkosz · 2026-07-01T23:05:47Z

+  }
+  else
+  {
+    return (gpu_thread.is_root_rank(__group)) ? ::cuda::std::optional{__result} : ::cuda::std::nullopt;


Something to consider in general, not just for this review: I just thought in cases where broadcast (or partial broadcast) is free, would it make sense for all threads to return the result? We could document that without cudax::broadcasted some subset of threads that includes the leader returns the result, while with broadcasted all threads return.

Not sure if you can use it meaningfully, but I guess you could opportunistically use the result in a larger set of threads

Yeah, the answer is that I'm not sure. Because for some algorithm + group combinations the broadcast is free, but for others it's not.

And then it's hard to come up with some rules when the broadcast is implicit. We would have to document each case and it will be confusing and less generic. So, to be consistent with other cases I thought I would return the result only by the root rank.

Wouldn't documenting that using the broadcasted version is free when used with this_thread/this_warp/threads within warp sound better?

There is one more option to not document it, but just have the user opportunistically check the returned optional if it has a value. But I don't know if it's even useful for anything. Not sure what's the best here

I was also considering this, but I don't think it's a good idea, because the user might rely on has_value() => is_root_rank(), which wouldn't be true for this case.

I'm wondering whether there is actually something wrong about the current approach. You should have used the broadcasted tag, if you wanted the result to be known to all members of the group 🙂

Is there a problem with this logic? Isn't the problem just that you see the implementation and feel like we are wasting infotmation in the non-broadcasted case? I think the behavior makes sense from the design perspective

No, I don't think there is anything wrong with the current logic, just wanted to bring this up as an option, but we don't have to change anything

davebayer requested a review from a team as a code owner July 1, 2026 12:45

davebayer requested a review from caugonnet July 1, 2026 12:45

github-project-automation Bot added this to CCCL Jul 1, 2026

github-project-automation Bot moved this to Todo in CCCL Jul 1, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jul 1, 2026

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

davebayer changed the title ~~[cudax] Implement cudax::coop::any algorithm~~ [cudax] Implement cudax::coop::any algorithm for <= warp groups Jul 1, 2026

This was linked to issues Jul 1, 2026

[FEA]: Implement cudax::coop::any_of for this_warp #9670

Open

[FEA]: Implement cudax::coop::any_of arbitrary threads within a warp groups #9672

Open

[cudax] Implement cudax::coop::any_of algorithm

6632b71

davebayer force-pushed the coop_any branch from 3c30139 to 6632b71 Compare July 1, 2026 14:52

davebayer changed the title ~~[cudax] Implement cudax::coop::any algorithm for <= warp groups~~ [cudax] Implement cudax::coop::any_of algorithm for <= warp groups Jul 1, 2026

pciolkosz approved these changes Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

davebayer commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Summary by CodeRabbit

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

coderabbitai Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 09m: Pass: 100%/55 | Total: 9h 49m | Max: 1h 09m | Hits: 79%/43333

Uh oh!

pciolkosz left a comment

Choose a reason for hiding this comment

Uh oh!

pciolkosz Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

pciolkosz Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

pciolkosz Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

davebayer Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

pciolkosz Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

davebayer Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

pciolkosz Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davebayer commented Jul 1, 2026 •

edited

Loading