Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe by Oseltamivir · Pull Request #1373 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-05-13T20:04:37Z

Summary

Apply the validated DeepSeek-V4-Pro MI355X recipe from vllm-project/recipes#433 (which in turn sources its serving config from vllm-project/vllm#40871) to the existing dsv4-fp8-mi355x-vllm benchmark. The base vLLM build (PR #40889 with AITER-accelerated sparse MLA decode, pinned in dsv4_fp8_mi355x_vllm.sh) is unchanged; only serving flags / env vars / sweep range are updated.

Server-side changes

	Before	After (recipe #433)
`VLLM_ROCM_USE_AITER_LINEAR`	unset	`1`
`--distributed-executor-backend`	(default)	`mp`
`--max-num-batched-tokens`	(default)	`8192`
`--async-scheduling`	off	on
`--gpu-memory-utilization`	`0.90`	`0.6`
`--max-num-seqs`	`32`	`128`
`--tool-call-parser deepseek_v4`	on	dropped
`--enable-auto-tool-choice`	on	dropped

Tool-call flags were removed because the recipe omits them and the throughput benchmark does not exercise tool calling. All other existing flags (--kv-cache-dtype fp8, --moe-backend triton_unfused, --enforce-eager, --no-enable-prefix-caching, --tokenizer-mode deepseek_v4, --reasoning-parser deepseek_v4) are preserved.

Sweep changes

dsv4-fp8-mi355x-vllm was previously pinned to conc=1 only (single-sequence marker). With --max-num-seqs=128 validated by the recipe, the sweep is expanded to conc 4-64 for both 1k1k and 8k1k, matching dsv4-fp8-mi355x-sglang so vLLM↔SGLang results are directly comparable on the same MI355X runner.

Validated locally:

$ python utils/matrix_logic/generate_sweep_configs.py full-sweep \
    --config-files .github/configs/amd-master.yaml \
    --framework vllm --runner-type mi355x
# 10 dsv4-fp8-mi355x-vllm configs generated (5 per ISL/OSL, tp=8, conc 4..64)

Test plan

Apply sweep-enabled label so run-sweep.yml exercises the new flags end-to-end at trimmed concurrency (highest conc per parallelism config).
Confirm the AITER MLA decode build (vLLM PR #40889 SHA b3a4a44) still installs cleanly with the new server flags.
Inspect agg_bmk.json for dsv4 mi355x-vllm entries and compare throughput vs. dsv4-fp8-mi355x-sglang at matching concurrencies.

Adopt the validated DeepSeek-V4-Pro MI355X (TP=8) settings from vllm-project/recipes#433 for the existing AITER MLA decode benchmark: * Add VLLM_ROCM_USE_AITER_LINEAR=1 env var * Add --distributed-executor-backend mp, --max-num-batched-tokens 8192, --async-scheduling server flags * Tune --gpu-memory-utilization 0.90 -> 0.6 and --max-num-seqs 32 -> 128 * Drop --tool-call-parser / --enable-auto-tool-choice (not in recipe, not exercised by these throughput benchmarks) * Expand sweep from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang for vLLM<->SGLang comparability now that max-num-seqs=128 allows it

github-actions · 2026-05-13T20:04:49Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-13T20:04:49Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-13T20:05:24Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25823385842
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25823385842

github-actions · 2026-05-13T20:05:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25823391939
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25823391939

claude · 2026-05-13T20:10:51Z

+    - "Tune: --gpu-memory-utilization 0.90 -> 0.6, --max-num-seqs 32 -> 128"
+    - "Drop --tool-call-parser deepseek_v4 / --enable-auto-tool-choice (not in recipe; benchmark doesn't exercise tool calling)"
+    - "Expand search space from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang for vLLM<->SGLang comparability now that max-num-seqs=128 supports it"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1373


🟡 The new perf-changelog.yaml entry at line 2455 has an unfilled placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of pull/1373. Every other entry in the file uses a real PR number, so the link as-committed returns a 404 and breaks the changelog's audit trail. Trivial one-character fix: replace XXXX with 1373.

Extended reasoning...

What the bug is

The new entry appended to perf-changelog.yaml for the dsv4-fp8-mi355x-vllm recipe change ends with an unfilled template placeholder:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

This is on line 2455 of the committed file. The placeholder XXXX was never substituted with the actual PR number (1373).

How it manifests

Following the link in the changelog yields a GitHub 404 (no PR numbered XXXX exists in the repo). Any tooling that consumes perf-changelog.yaml to build a changelog, navigate from a recipe key back to its motivating PR, or audit which PR introduced a given configuration change will either error out (URL validation) or silently emit a dead link.

Why existing code doesn't prevent it

perf-changelog.yaml is hand-maintained — there is no automated check that pr-link resolves to a real PR. The PR description itself documents the intended link as pull/1373, and every other entry in the file uses a real numeric PR number (e.g. lines 2426, 2432, 2438, 2444 reference /pull/1329, /pull/1343, /pull/1344, /pull/1346). This is clearly an unfinished template — the author left XXXX as a fill-me-in marker and forgot to update it before pushing.

Note on what reviewers see

The PR-diff viewer renders the bottom of the file with the real PR number 1373 filled in (since GitHub knows the PR number when rendering), but git show dacf068 -- perf-changelog.yaml and reading the on-disk file directly both confirm the committed content is literally XXXX. Multiple independent verifiers confirmed this by reading the file and the git object directly.

Step-by-step proof

The PR is Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe #1373 ("Improve dsv4-fp8-mi355x-vllm with [Do not merge] Add the Deepseek-V4-Pro supported on MI355x vllm-project/recipes#433 MI355X recipe"), and the PR description's link references pull/1373.

The diff for perf-changelog.yaml appends a new entry whose final line is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

On the committed branch (git show dacf068:perf-changelog.yaml → line 2455), the value is the literal string XXXX, not a numeric PR id.

Visiting https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX returns a 404 (GitHub rejects non-numeric PR identifiers).

Every other pr-link entry in the file is a real numeric PR (e.g. /pull/1329, /pull/1344, /pull/1346), so this is the only entry that 404s.

Impact

No runtime effect — this is documentation/metadata only. But perf-changelog.yaml is the audit trail tying recipe-key changes back to the PRs that introduced them; a dead link defeats that purpose for this entry.

Fix

Replace XXXX with 1373 on line 2455 of perf-changelog.yaml.

functionstackx · 2026-05-13T23:26:59Z

i thought inferencex only does the most commonly served fp4 deepseekv4 pro tho

github-actions · 2026-05-13T23:30:37Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25832050143
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25832050143

Oseltamivir requested a review from a team May 13, 2026 20:04

Oseltamivir requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 13, 2026 20:04

github-project-automation Bot added this to InferenceMAX Board May 13, 2026

Oseltamivir added the sweep-enabled label May 13, 2026

Backfill PR #1373 link in perf-changelog

5d62ead

claude Bot reviewed May 13, 2026

View reviewed changes

Merge branch 'main' into dsv4-fp8-mi355x-vllm-recipe-433

d1902bd

functionstackx closed this May 13, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe#1373

Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe#1373
Oseltamivir wants to merge 3 commits into
mainfrom
dsv4-fp8-mi355x-vllm-recipe-433

Oseltamivir commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

functionstackx commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oseltamivir commented May 13, 2026

Summary

Server-side changes

Sweep changes

Test plan

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants