Skip to content

Adding a new offload_args intrinsic, which only maps arguments#150683

Open
ZuseZ4 wants to merge 3 commits into
rust-lang:mainfrom
ZuseZ4:offload-host-intrinsic
Open

Adding a new offload_args intrinsic, which only maps arguments#150683
ZuseZ4 wants to merge 3 commits into
rust-lang:mainfrom
ZuseZ4:offload-host-intrinsic

Conversation

@ZuseZ4

@ZuseZ4 ZuseZ4 commented Jan 5, 2026

Copy link
Copy Markdown
Member

View all comments

This intrinsic helps with supporting the various AMD & NVIDIA libraries like rocBLAS or cuBLAS.
They provide functions which must be called from the host, but require a mixture of host and device pointers.
This offload_args intrinsic maps our host allocations to device allocations and transfers memory as required.
It reuses the whole infrastructure which we already have for the main offload intrinsic.
Unlike the main offload intrinsic, this also already fully works with std. I also got it to work with a single cargo invocation:
RUSTFLAGS="-L native=/opt/rocm-6.4.0/lib -l dylib=rocblas -l dylib=amdhip64 -l dylib=omp -l dylib=omptarget -Zoffload=Args -Zunstable-options" cargo +offload run -r

I'll rebase and drop the first 3 commits once the cleanup PR lands.

I updated compiler/rustc_monomorphize/src/collector/autodiff.rs, it now works without no_mangle, otherwise the function won't be codegen'ed. It also works without lto=fat if we only have main.rs
If we put a function in lib.rs and call it in main.rs, then it currently trips the verifier. Happy to fix it either here or in a follow-up PR:

thread 'rustc' (494962) panicked at compiler/rustc_monomorphize/src/collector.rs:468:13:
assertion failed: tcx.should_codegen_locally(instance)
stack backtrace:

cc @kevinsala @Sa4dUs

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 5, 2026
@ZuseZ4 ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Jan 5, 2026
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from 020f669 to 555131e Compare January 5, 2026 15:21
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 mentioned this pull request Jan 6, 2026
5 tasks
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch 2 times, most recently from 020f669 to 0228337 Compare January 14, 2026 00:46
@ZuseZ4 ZuseZ4 marked this pull request as ready for review January 18, 2026 22:15
@rustbot

rustbot commented Jan 18, 2026

Copy link
Copy Markdown
Collaborator

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 18, 2026
@rustbot

rustbot commented Jan 18, 2026

Copy link
Copy Markdown
Collaborator

r? @mati865

rustbot has assigned @mati865.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@ZuseZ4

ZuseZ4 commented Jan 18, 2026

Copy link
Copy Markdown
Member Author

r? @oli-obk

@rustbot rustbot assigned oli-obk and unassigned mati865 Jan 18, 2026
Comment thread compiler/rustc_session/src/options.rs Outdated
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 19, 2026
@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from 0228337 to e0fd81c Compare January 19, 2026 23:00
@rustbot

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from e0fd81c to 1a22802 Compare January 20, 2026 01:30
@rust-bors

This comment has been minimized.

Comment thread library/core/src/intrinsics/mod.rs Outdated
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 27, 2026
…-obk

offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 27, 2026
…-obk

offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 27, 2026
…-obk

offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
Zalathar added a commit to Zalathar/rust that referenced this pull request Jan 28, 2026
…-obk

offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in rust-lang#150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
rust-timer added a commit that referenced this pull request Jan 28, 2026
Rollup merge of #150893 - ZuseZ4:move-un-register-lib, r=oli-obk

offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in #150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from 1a22802 to f32d2da Compare January 29, 2026 23:35
@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from f32d2da to ac33037 Compare January 31, 2026 06:13
@ZuseZ4

ZuseZ4 commented Jan 31, 2026

Copy link
Copy Markdown
Member Author

Something changed after the last rebasing; two tests now fail. The query stack however also looks a lot more informative than I had it in mind, thanks to whoever improved that. I should add a proper detection of the case where people just add the declaration, but not the GPU implementation, so we can error instead of throwing an ICE.

error: internal compiler error: compiler/rustc_hir_typeck/src/lib.rs:124:9: can't type-check body of DefId(0:6 ~ control_flow[d765]::{extern#0}::foo)
  --> /tmp/drehwald1/prog/rust/tests/codegen-llvm/gpu_offload/control_flow.rs:40:5
   |
40 |     pub fn foo(A: *const [f32; 6]) -> ();
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


thread 'rustc' (2086355) panicked at compiler/rustc_hir_typeck/src/lib.rs:124:9:
Box<dyn Any>
stack backtrace:
   0: std::panicking::begin_panic::<rustc_errors::ExplicitBug>
   1: <rustc_errors::diagnostic::BugAbort as rustc_errors::diagnostic::EmissionGuarantee>::emit_producing_guarantee
   2: <rustc_errors::DiagCtxtHandle>::span_bug::<rustc_span::span_encoding::Span, alloc::string::String>
   3: rustc_middle::util::bug::opt_span_bug_fmt::<rustc_span::span_encoding::Span>::{closure#0}
   4: rustc_middle::ty::context::tls::with_opt::<rustc_middle::util::bug::opt_span_bug_fmt<rustc_span::span_encoding::Span>::{closure#0}, !>::{closure#0}
   5: rustc_middle::ty::context::tls::with_context_opt::<rustc_middle::ty::context::tls::with_opt<rustc_middle::util::bug::opt_span_bug_fmt<rustc_span::span_encoding::Span>::{closure#0}, !>::{closure#0}, !>
   6: rustc_middle::util::bug::span_bug_fmt::<rustc_span::span_encoding::Span>
   7: rustc_hir_typeck::typeck_with_inspect::{closure#0}::{closure#0}
   8: rustc_hir_typeck::typeck_with_inspect
      [... omitted 2 frames ...]
   9: rustc_mir_build::thir::pattern::check_match::check_match
      [... omitted 2 frames ...]
  10: rustc_mir_build::builder::build_mir_inner_impl
  11: rustc_mir_transform::mir_built
      [... omitted 2 frames ...]
  12: rustc_mir_transform::ffi_unwind_calls::has_ffi_unwind_calls
      [... omitted 2 frames ...]
  13: rustc_mir_transform::mir_promoted
      [... omitted 2 frames ...]
  14: rustc_borrowck::mir_borrowck
      [... omitted 2 frames ...]
  15: rustc_mir_transform::mir_drops_elaborated_and_const_checked
      [... omitted 2 frames ...]
  16: rustc_mir_transform::optimized_mir
      [... omitted 2 frames ...]
  17: <rustc_middle::ty::context::TyCtxt>::instance_mir
  18: rustc_monomorphize::collector::items_of_instance
      [... omitted 2 frames ...]
  19: rustc_monomorphize::collector::collect_items_rec
  20: rustc_monomorphize::collector::collect_items_rec
  21: rustc_monomorphize::collector::collect_items_root
  22: <rustc_data_structures::sync::parallel::ParallelGuard>::run::<(), rustc_data_structures::sync::parallel::par_for_each_in<rustc_middle::mir::mono::MonoItem, alloc::vec::Vec<rustc_middle::mir::mono::MonoItem>, rustc_monomorphize::collector::collect_crate_mono_items::{closure#1}::{closure#0}>::{closure#0}::{closure#1}::{closure#0}>
  23: rustc_data_structures::sync::parallel::par_for_each_in::<rustc_middle::mir::mono::MonoItem, alloc::vec::Vec<rustc_middle::mir::mono::MonoItem>, rustc_monomorphize::collector::collect_crate_mono_items::{closure#1}::{closure#0}>
  24: <rustc_session::session::Session>::time::<(), rustc_monomorphize::collector::collect_crate_mono_items::{closure#1}>
  25: rustc_monomorphize::collector::collect_crate_mono_items
  26: rustc_monomorphize::partitioning::collect_and_partition_mono_items
      [... omitted 2 frames ...]
  27: rustc_codegen_ssa::base::codegen_crate::<rustc_codegen_llvm::LlvmCodegenBackend>
  28: <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::CodegenBackend>::codegen_crate
  29: <rustc_session::session::Session>::time::<alloc::boxed::Box<dyn core::any::Any>, rustc_interface::passes::start_codegen::{closure#0}>
  30: rustc_interface::passes::start_codegen
  31: <rustc_interface::queries::Linker>::codegen_and_build_linker
  32: <std::thread::local::LocalKey<core::cell::Cell<*const ()>>>::with::<rustc_middle::ty::context::tls::enter_context<<rustc_middle::ty::context::GlobalCtxt>::enter<rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>::{closure#1}, core::option::Option<rustc_interface::queries::Linker>>::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>
  33: <rustc_middle::ty::context::TyCtxt>::create_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}>
  34: <rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2} as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once::{shim:vtable#0}
  35: rustc_interface::passes::create_and_enter_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>
  36: rustc_interface::interface::run_compiler::<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}
  37: rustc_span::create_session_globals_then::<(), rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

note: using internal features is not supported and expected to cause internal compiler errors when used incorrectly

note: rustc 1.95.0-nightly (f32d2da35 2026-01-29) running on x86_64-unknown-linux-gnu

note: compiler flags: -Z threads=1 -Z simulate-remapped-rust-src-base=/rustc/FAKE_PREFIX -Z translate-remapped-path-to-local-path=no -Z ignore-directory-in-diagnostics-source-blocks=/g/g90/drehwald1/.cargo -Z ignore-directory-in-diagnostics-source-blocks=/tmp/drehwald1/prog/rust/vendor -C debug-assertions=no -Z codegen-source-order -C rpath -C debuginfo=0 -Z offload=Test -Z unstable-options -C opt-level=3 -C lto=fat

query stack during panic:
#0 [typeck] type-checking `foo`
#1 [check_match] match-checking `foo`
#2 [mir_built] building MIR for `foo`
#3 [has_ffi_unwind_calls] checking if `foo` contains FFI-unwind calls
#4 [mir_promoted] promoting constants in MIR for `foo`
#5 [mir_borrowck] borrow-checking `foo`
#6 [mir_drops_elaborated_and_const_checked] elaborating drops for `foo`
#7 [optimized_mir] optimizing MIR for `foo`
#8 [items_of_instance] collecting items used by `foo`
#9 [collect_and_partition_mono_items] collect_and_partition_mono_items
end of query stack
error: aborting due to 1 previous error; 2 warnings emitted
------------------------------------------

---- [codegen] tests/codegen-llvm/gpu_offload/control_flow.rs stdout end ----

failures:
    [codegen] tests/codegen-llvm/gpu_offload/scalar_host.rs
    [codegen] tests/codegen-llvm/gpu_offload/control_flow.rs

@rust-bors

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from ac33037 to a63cb61 Compare February 6, 2026 21:01
@rustbot

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from a63cb61 to 9846fe8 Compare February 8, 2026 01:51
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from 9846fe8 to da6041c Compare February 9, 2026 15:04
@rustbot

rustbot commented Feb 9, 2026

Copy link
Copy Markdown
Collaborator

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-bors

rust-bors Bot commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

☔ The latest upstream changes (presumably #153379) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants