Adding a new offload_args intrinsic, which only maps arguments#150683
Open
ZuseZ4 wants to merge 3 commits into
Open
Adding a new offload_args intrinsic, which only maps arguments#150683ZuseZ4 wants to merge 3 commits into
ZuseZ4 wants to merge 3 commits into
Conversation
This comment has been minimized.
This comment has been minimized.
020f669 to
555131e
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
72feb1c to
b6f4295
Compare
This comment has been minimized.
This comment has been minimized.
020f669 to
0228337
Compare
8 tasks
Collaborator
Collaborator
Member
Author
|
r? @oli-obk |
saethlin
reviewed
Jan 19, 2026
0228337 to
e0fd81c
Compare
This comment has been minimized.
This comment has been minimized.
e0fd81c to
1a22802
Compare
This comment has been minimized.
This comment has been minimized.
Flakebi
reviewed
Jan 21, 2026
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this pull request
Jan 27, 2026
…-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic. r? oli-obk
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this pull request
Jan 27, 2026
…-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic. r? oli-obk
GuillaumeGomez
added a commit
to GuillaumeGomez/rust
that referenced
this pull request
Jan 27, 2026
…-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic. r? oli-obk
Zalathar
added a commit
to Zalathar/rust
that referenced
this pull request
Jan 28, 2026
…-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic. r? oli-obk
rust-timer
added a commit
that referenced
this pull request
Jan 28, 2026
Rollup merge of #150893 - ZuseZ4:move-un-register-lib, r=oli-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in #150683, where I introduce a new variant of our offload intrinsic. r? oli-obk
1a22802 to
f32d2da
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
f32d2da to
ac33037
Compare
Member
Author
|
Something changed after the last rebasing; two tests now fail. The query stack however also looks a lot more informative than I had it in mind, thanks to whoever improved that. I should add a proper detection of the case where people just add the declaration, but not the GPU implementation, so we can error instead of throwing an ICE. |
This comment has been minimized.
This comment has been minimized.
ac33037 to
a63cb61
Compare
This comment has been minimized.
This comment has been minimized.
a63cb61 to
9846fe8
Compare
This comment has been minimized.
This comment has been minimized.
9846fe8 to
da6041c
Compare
Collaborator
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Contributor
|
☔ The latest upstream changes (presumably #153379) made this pull request unmergeable. Please resolve the merge conflicts. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
View all comments
This intrinsic helps with supporting the various AMD & NVIDIA libraries like rocBLAS or cuBLAS.
They provide functions which must be called from the host, but require a mixture of host and device pointers.
This offload_args intrinsic maps our host allocations to device allocations and transfers memory as required.
It reuses the whole infrastructure which we already have for the main offload intrinsic.
Unlike the main offload intrinsic, this also already fully works with std. I also got it to work with a single cargo invocation:
RUSTFLAGS="-L native=/opt/rocm-6.4.0/lib -l dylib=rocblas -l dylib=amdhip64 -l dylib=omp -l dylib=omptarget -Zoffload=Args -Zunstable-options" cargo +offload run -rI'll rebase and drop the first 3 commits once the cleanup PR lands.
I updated
compiler/rustc_monomorphize/src/collector/autodiff.rs, it now works without no_mangle, otherwise the function won't be codegen'ed. It also works without lto=fat if we only have main.rsIf we put a function in lib.rs and call it in main.rs, then it currently trips the verifier. Happy to fix it either here or in a follow-up PR:
cc @kevinsala @Sa4dUs