[cudax] Implement cudax::coop::any_of algorithm for <= warp groups#9665
[cudax] Implement cudax::coop::any_of algorithm for <= warp groups#9665davebayer wants to merge 1 commit into
cudax::coop::any_of algorithm for <= warp groups#9665Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a new Changescoop::any feature
Suggested reviewers: Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cudax/include/cuda/experimental/__coop/any.cuh (1)
49-50: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winsuggestion: Accept
constbool arrays for__thread_data.The algorithm only reads the array, but every overload takes
_Tp (&)[_Np]/bool (&)[_Np], soconst bool[N]inputs cannot callcoop::any. Consider changing the internal, public, and deleted overloads to takeconst bool (&)[_Np]/const _Tp (&)[_Np]. As per path instructions, focus on experimental API clarity. As per coding guidelines, all variables that are not modified must be declaredconst.Also applies to: 70-70, 89-90, 97-98, 105-109
Sources: Coding guidelines, Path instructions
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 15730b6b-01fe-4ca2-85ab-918364345690
📒 Files selected for processing (6)
cudax/include/cuda/experimental/__coop/any.cuhcudax/include/cuda/experimental/coop.cuhcudax/test/CMakeLists.txtcudax/test/coop/any/this_thread.cucudax/test/coop/any/this_warp.cucudax/test/coop/any/threads_within_warp.cu
This comment has been minimized.
This comment has been minimized.
cudax::coop::any algorithmcudax::coop::any algorithm for <= warp groups
cudax::coop::any algorithm for <= warp groupscudax::coop::any_of algorithm for <= warp groups
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
🥳 CI Workflow Results🟩 Finished in 1h 09m: Pass: 100%/55 | Total: 9h 49m | Max: 1h 09m | Hits: 79%/43333See results here. |
pciolkosz
left a comment
There was a problem hiding this comment.
Do we need a variant that just takes a single bool instead of an array?
| } | ||
| } | ||
|
|
||
| struct CustomBinaryPartition |
There was a problem hiding this comment.
I believe this is not needed in this file
| } | ||
| } | ||
|
|
||
| struct CustomBinaryPartition |
| } | ||
| else | ||
| { | ||
| return (gpu_thread.is_root_rank(__group)) ? ::cuda::std::optional{__result} : ::cuda::std::nullopt; |
There was a problem hiding this comment.
Something to consider in general, not just for this review: I just thought in cases where broadcast (or partial broadcast) is free, would it make sense for all threads to return the result? We could document that without cudax::broadcasted some subset of threads that includes the leader returns the result, while with broadcasted all threads return.
Not sure if you can use it meaningfully, but I guess you could opportunistically use the result in a larger set of threads
There was a problem hiding this comment.
Yeah, the answer is that I'm not sure. Because for some algorithm + group combinations the broadcast is free, but for others it's not.
And then it's hard to come up with some rules when the broadcast is implicit. We would have to document each case and it will be confusing and less generic. So, to be consistent with other cases I thought I would return the result only by the root rank.
Wouldn't documenting that using the broadcasted version is free when used with this_thread/this_warp/threads within warp sound better?
There was a problem hiding this comment.
There is one more option to not document it, but just have the user opportunistically check the returned optional if it has a value. But I don't know if it's even useful for anything. Not sure what's the best here
There was a problem hiding this comment.
I was also considering this, but I don't think it's a good idea, because the user might rely on has_value() => is_root_rank(), which wouldn't be true for this case.
I'm wondering whether there is actually something wrong about the current approach. You should have used the broadcasted tag, if you wanted the result to be known to all members of the group 🙂
Is there a problem with this logic? Isn't the problem just that you see the implementation and feel like we are wasting infotmation in the non-broadcasted case? I think the behavior makes sense from the design perspective
There was a problem hiding this comment.
No, I don't think there is anything wrong with the current logic, just wanted to bring this up as an option, but we don't have to change anything
Fixes #9669, #9670 and #9672.