fix(torchtitan): correct MFU on MI350X/MI355X via peak FLOPS patch by WangLingxun · Pull Request #772 · AMD-AGI/Primus

WangLingxun · 2026-06-18T07:39:17Z

TorchTitan computes MFU as num_flops_per_token * tps / gpu_peak_flops, where gpu_peak_flops comes from a hardcoded device-name table in torchtitan.tools.utils.get_peak_flops. The vendored TorchTitan does not know about the AMD MI350 series (gfx950), so the lookup falls through to the A100 fallback (312 TFLOPS), making the MFU denominator ~8x too small and reporting impossible values (e.g. 563%). Throughput (TFLOP/s) itself was correct; only MFU was affected. MI300X was unaffected since it is present in the upstream table.

Add a setup-phase patch that wraps get_peak_flops and returns the correct BF16 dense peaks from AMD's product pages:

MI355X: 2500 TFLOPS (matches upstream TorchTitan)
MI350X: 2300 TFLOPS (not yet covered upstream)

The patch runs before MetricsProcessor caches gpu_peak_flops and delegates all other devices to the original implementation, so it is safe to keep after the vendored TorchTitan is updated.

fix(torchtitan): correct MFU on MI350X/MI355X via peak FLOPS patch TorchTitan computes MFU as num_flops_per_token * tps / gpu_peak_flops, where gpu_peak_flops comes from a hardcoded device-name table in torchtitan.tools.utils.get_peak_flops. The vendored TorchTitan does not know about the AMD MI350 series (gfx950), so the lookup falls through to the A100 fallback (312 TFLOPS), making the MFU denominator ~8x too small and reporting impossible values (e.g. 563%). Throughput (TFLOP/s) itself was correct; only MFU was affected. MI300X was unaffected since it is present in the upstream table. Add a setup-phase patch that wraps get_peak_flops and returns the correct BF16 dense peaks from AMD's product pages: - MI355X: 2500 TFLOPS (matches upstream TorchTitan) - MI350X: 2300 TFLOPS (not yet covered upstream) The patch runs before MetricsProcessor caches gpu_peak_flops and delegates all other devices to the original implementation, so it is safe to keep after the vendored TorchTitan is updated.

WangLingxun requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners June 18, 2026 07:39

wenxie-amd merged commit 83691e1 into main Jun 19, 2026
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(torchtitan): correct MFU on MI350X/MI355X via peak FLOPS patch#772

fix(torchtitan): correct MFU on MI350X/MI355X via peak FLOPS patch#772
wenxie-amd merged 1 commit into
mainfrom
fix/torchtitan-mi350-peak-flops-mfu

WangLingxun commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WangLingxun commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants