Skip to content

Add inline asm support for amdgpu#149793

Open
Flakebi wants to merge 1 commit into
rust-lang:mainfrom
Flakebi:inline-asm
Open

Add inline asm support for amdgpu#149793
Flakebi wants to merge 1 commit into
rust-lang:mainfrom
Flakebi:inline-asm

Conversation

@Flakebi

@Flakebi Flakebi commented Dec 8, 2025

Copy link
Copy Markdown
Contributor

View all comments

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for vgpr (vector general purpose register) and
sgpr (scalar general purpose register).
The LLVM backend supports two more classes, reg, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
InlineAsmReg::name() now returns a Cow instead of a &'static str.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: #135024

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 8, 2025
@rustbot

rustbot commented Dec 8, 2025

Copy link
Copy Markdown
Collaborator

r? @eholk

rustbot has assigned @eholk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@Flakebi Flakebi mentioned this pull request Dec 8, 2025
26 tasks
@rust-log-analyzer

This comment has been minimized.

@rustbot

rustbot commented Dec 9, 2025

Copy link
Copy Markdown
Collaborator

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

@eholk

eholk commented Dec 9, 2025

Copy link
Copy Markdown
Contributor

This seems okay to me, but I'd rather someone more familiar with this part of the compiler give the final signoff.

@bors r?

@rustbot

rustbot commented Dec 9, 2025

Copy link
Copy Markdown
Collaborator

Error: Parsing assign command in comment failed: ...'' | error: specify user to assign to at >| ''...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #triagebot on Zulip.

@eholk

eholk commented Dec 9, 2025

Copy link
Copy Markdown
Contributor

@bors r? compiler

@rustbot rustbot assigned fee1-dead and unassigned eholk Dec 9, 2025
@fee1-dead

Copy link
Copy Markdown
Member

@rustbot reroll

@rustbot rustbot assigned chenyukang and unassigned fee1-dead Dec 10, 2025
Comment thread tests/assembly-llvm/asm/amdgpu-types.rs Outdated
@Flakebi Flakebi force-pushed the inline-asm branch 2 times, most recently from bdb726b to 9db5dca Compare December 14, 2025 15:07
@Flakebi

Flakebi commented Dec 14, 2025

Copy link
Copy Markdown
Contributor Author

Removed return type from tests to fix conflict with #149991, which starts disallowing returns in gpu-kernel functions.

@chenyukang

Copy link
Copy Markdown
Member

The change seems Ok, i'd like people with more background to take a look.
@rustbot reroll

@rustbot rustbot assigned jdonszelmann and unassigned chenyukang Dec 19, 2025
@jdonszelmann

Copy link
Copy Markdown
Contributor

That's not me (sorry it took me a while because of holidays). But iirc that could be amanieu? r? @Amanieu

@rustbot rustbot assigned Amanieu and unassigned jdonszelmann Jan 6, 2026
@rust-bors

rust-bors Bot commented Jan 9, 2026

Copy link
Copy Markdown
Contributor

☔ The latest upstream changes (presumably #150866) made this pull request unmergeable. Please resolve the merge conflicts.

@rust-bors

rust-bors Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

📌 Commit bdf8269 has been approved by Amanieu

It is now in the queue for this repository.

@rust-bors rust-bors Bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 8, 2026
@rust-bors

This comment has been minimized.

@Amanieu

Amanieu commented Jun 8, 2026

Copy link
Copy Markdown
Member

@bors delegate+

r=me once conflicts are resolved

@rust-bors

rust-bors Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✌️ @Flakebi, you can now approve this pull request!

If @Amanieu told you to "r=me" after making some further change, then please make that change and post @bors r=Amanieu.

View changes since this delegation.

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.
@rustbot

rustbot commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@Flakebi

Flakebi commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@bors r=Amanieu

@rust-bors

rust-bors Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

📌 Commit e699f5a has been approved by Amanieu

It is now in the queue for this repository.

@rust-bors rust-bors Bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 11, 2026
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Jun 11, 2026
Add inline asm support for amdgpu

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: rust-lang#135024
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Jun 11, 2026
Add inline asm support for amdgpu

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: rust-lang#135024
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Jun 11, 2026
Add inline asm support for amdgpu

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: rust-lang#135024
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Jun 11, 2026
Add inline asm support for amdgpu

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: rust-lang#135024
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Jun 11, 2026
Add inline asm support for amdgpu

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: rust-lang#135024
rust-bors Bot pushed a commit that referenced this pull request Jun 11, 2026
…uwer

Rollup of 23 pull requests

Successful merges:

 - #157716 (update Enzyme, June'26)
 - #149793 (Add inline asm support for amdgpu)
 - #152852 (Remove driver_lint_caps)
 - #155299 (make repr_transparent_non_zst_fields a hard error)
 - #155439 (Enable Cargo's new build-dir layout)
 - #157612 (Add a test where subtyping inhibits coercion.)
 - #157626 (Autogenerate unstable compiler flag stubs for unstable-book)
 - #157667 (Rename typing modes to better describe real usage)
 - #156212 (Additionally gate negative bounds behind new `-Zinternal-testing-features`)
 - #157342 (Reduce verbosity of cycle errors when possible)
 - #157366 (Add a regression test for an unconstrained TransmuteFrom ICE)
 - #157459 (rustc_target: callconv: powerpc64: Remove unreachable fallback code path)
 - #157658 (UnsafeCell: mention shared-ref-to-interior case, fix aliasing model inaccuracy)
 - #157698 (Remove an unnecessary cloning)
 - #157699 (Arg splat experiment - hir FnDecl impl)
 - #157713 (resolve: Remove exported imports from `maybe_unused_trait_imports`)
 - #157722 (Move create_scope_map to rustc_codegen_ssa.)
 - #157725 (Keep generic suggestion for macro-expanded missing-type items)
 - #157733 (Remove old FIXMEs about nocapture attribute)
 - #157737 (Reorganize `tests/ui/issues` [7/N])
 - #157746 (supports_c_variadic_definitions: extend checklist for new targets)
 - #157763 (Move unused target expression error to appropriate place and rename it)
 - #157768 (codegen_ssa: peel trans. wrappers on scalable vecs)
rust-bors Bot pushed a commit that referenced this pull request Jun 11, 2026
…uwer

Rollup of 23 pull requests

Successful merges:

 - #157716 (update Enzyme, June'26)
 - #149793 (Add inline asm support for amdgpu)
 - #152852 (Remove driver_lint_caps)
 - #155299 (make repr_transparent_non_zst_fields a hard error)
 - #155439 (Enable Cargo's new build-dir layout)
 - #157612 (Add a test where subtyping inhibits coercion.)
 - #157626 (Autogenerate unstable compiler flag stubs for unstable-book)
 - #157667 (Rename typing modes to better describe real usage)
 - #156212 (Additionally gate negative bounds behind new `-Zinternal-testing-features`)
 - #157342 (Reduce verbosity of cycle errors when possible)
 - #157366 (Add a regression test for an unconstrained TransmuteFrom ICE)
 - #157459 (rustc_target: callconv: powerpc64: Remove unreachable fallback code path)
 - #157658 (UnsafeCell: mention shared-ref-to-interior case, fix aliasing model inaccuracy)
 - #157698 (Remove an unnecessary cloning)
 - #157699 (Arg splat experiment - hir FnDecl impl)
 - #157713 (resolve: Remove exported imports from `maybe_unused_trait_imports`)
 - #157722 (Move create_scope_map to rustc_codegen_ssa.)
 - #157725 (Keep generic suggestion for macro-expanded missing-type items)
 - #157733 (Remove old FIXMEs about nocapture attribute)
 - #157737 (Reorganize `tests/ui/issues` [7/N])
 - #157746 (supports_c_variadic_definitions: extend checklist for new targets)
 - #157763 (Move unused target expression error to appropriate place and rename it)
 - #157768 (codegen_ssa: peel trans. wrappers on scalable vecs)
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Jun 11, 2026
Add inline asm support for amdgpu

Add support for inline assembly for the amdgpu backend (the
amdgcn-amd-amdhsa target).
Add register classes for `vgpr` (vector general purpose register) and
`sgpr` (scalar general purpose register).
The LLVM backend supports two more classes, `reg`, which is either VGPR
or SGPR, up to the compiler to decide. As instructions often rely on a
register being either a VGPR or SGPR for the assembly to be valid, reg
doesn’t seem that useful (I struggled to write correct tests for it), so
I didn’t end up adding it.
The fourth register class is AGPRs, which only exist on some hardware
versions (not the consumer ones) and they have restricted ways to write
and read from them, which makes it hard to write a Rust variable into
them. They could be used inside assembly blocks, but I didn’t add them
as Rust register class.

There are a few change affecting general inline assembly code, that is
`InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`.
Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4
VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu
reg stores the register number(s) and a non-static String is generated
at runtime for the register name.
Similar for register classes and supported_types.

Vectors of 64-bit types are supported by the LLVM backend, but omitted
here to make the code simpler. There is currently no systematic support
in LLVM of which vectors of 64-bit types are supported. Also, they are
likely seldomly unused, vectors of 16- and 32-bit types are important.

Tracking issue: rust-lang#135024
rust-bors Bot pushed a commit that referenced this pull request Jun 11, 2026
…uwer

Rollup of 23 pull requests

Successful merges:

 - #157716 (update Enzyme, June'26)
 - #149793 (Add inline asm support for amdgpu)
 - #155299 (make repr_transparent_non_zst_fields a hard error)
 - #155439 (Enable Cargo's new build-dir layout)
 - #157612 (Add a test where subtyping inhibits coercion.)
 - #157626 (Autogenerate unstable compiler flag stubs for unstable-book)
 - #157667 (Rename typing modes to better describe real usage)
 - #149749 (Make `BorrowedBuf` and `BorrowedCursor` generic over the data)
 - #156212 (Additionally gate negative bounds behind new `-Zinternal-testing-features`)
 - #157342 (Reduce verbosity of cycle errors when possible)
 - #157366 (Add a regression test for an unconstrained TransmuteFrom ICE)
 - #157459 (rustc_target: callconv: powerpc64: Remove unreachable fallback code path)
 - #157658 (UnsafeCell: mention shared-ref-to-interior case, fix aliasing model inaccuracy)
 - #157698 (Remove an unnecessary cloning)
 - #157699 (Arg splat experiment - hir FnDecl impl)
 - #157713 (resolve: Remove exported imports from `maybe_unused_trait_imports`)
 - #157722 (Move create_scope_map to rustc_codegen_ssa.)
 - #157725 (Keep generic suggestion for macro-expanded missing-type items)
 - #157733 (Remove old FIXMEs about nocapture attribute)
 - #157737 (Reorganize `tests/ui/issues` [7/N])
 - #157746 (supports_c_variadic_definitions: extend checklist for new targets)
 - #157763 (Move unused target expression error to appropriate place and rename it)
 - #157768 (codegen_ssa: peel trans. wrappers on scalable vecs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.