Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe#1373
Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe#1373Oseltamivir wants to merge 3 commits into
Conversation
Adopt the validated DeepSeek-V4-Pro MI355X (TP=8) settings from vllm-project/recipes#433 for the existing AITER MLA decode benchmark: * Add VLLM_ROCM_USE_AITER_LINEAR=1 env var * Add --distributed-executor-backend mp, --max-num-batched-tokens 8192, --async-scheduling server flags * Tune --gpu-memory-utilization 0.90 -> 0.6 and --max-num-seqs 32 -> 128 * Drop --tool-call-parser / --enable-auto-tool-choice (not in recipe, not exercised by these throughput benchmarks) * Expand sweep from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang for vLLM<->SGLang comparability now that max-num-seqs=128 allows it
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25823385842 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25823391939 |
| - "Tune: --gpu-memory-utilization 0.90 -> 0.6, --max-num-seqs 32 -> 128" | ||
| - "Drop --tool-call-parser deepseek_v4 / --enable-auto-tool-choice (not in recipe; benchmark doesn't exercise tool calling)" | ||
| - "Expand search space from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang for vLLM<->SGLang comparability now that max-num-seqs=128 supports it" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1373 |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry at line 2455 has an unfilled placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of pull/1373. Every other entry in the file uses a real PR number, so the link as-committed returns a 404 and breaks the changelog's audit trail. Trivial one-character fix: replace XXXX with 1373.
Extended reasoning...
What the bug is
The new entry appended to perf-changelog.yaml for the dsv4-fp8-mi355x-vllm recipe change ends with an unfilled template placeholder:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXThis is on line 2455 of the committed file. The placeholder XXXX was never substituted with the actual PR number (1373).
How it manifests
Following the link in the changelog yields a GitHub 404 (no PR numbered XXXX exists in the repo). Any tooling that consumes perf-changelog.yaml to build a changelog, navigate from a recipe key back to its motivating PR, or audit which PR introduced a given configuration change will either error out (URL validation) or silently emit a dead link.
Why existing code doesn't prevent it
perf-changelog.yaml is hand-maintained — there is no automated check that pr-link resolves to a real PR. The PR description itself documents the intended link as pull/1373, and every other entry in the file uses a real numeric PR number (e.g. lines 2426, 2432, 2438, 2444 reference /pull/1329, /pull/1343, /pull/1344, /pull/1346). This is clearly an unfinished template — the author left XXXX as a fill-me-in marker and forgot to update it before pushing.
Note on what reviewers see
The PR-diff viewer renders the bottom of the file with the real PR number 1373 filled in (since GitHub knows the PR number when rendering), but git show dacf068 -- perf-changelog.yaml and reading the on-disk file directly both confirm the committed content is literally XXXX. Multiple independent verifiers confirmed this by reading the file and the git object directly.
Step-by-step proof
- The PR is Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe #1373 ("Improve dsv4-fp8-mi355x-vllm with [Do not merge] Add the Deepseek-V4-Pro supported on MI355x vllm-project/recipes#433 MI355X recipe"), and the PR description's link references
pull/1373. - The diff for
perf-changelog.yamlappends a new entry whose final line ispr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. - On the committed branch (
git show dacf068:perf-changelog.yaml→ line 2455), the value is the literal stringXXXX, not a numeric PR id. - Visiting
https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXreturns a 404 (GitHub rejects non-numeric PR identifiers). - Every other
pr-linkentry in the file is a real numeric PR (e.g./pull/1329,/pull/1344,/pull/1346), so this is the only entry that 404s.
Impact
No runtime effect — this is documentation/metadata only. But perf-changelog.yaml is the audit trail tying recipe-key changes back to the PRs that introduced them; a dead link defeats that purpose for this entry.
Fix
Replace XXXX with 1373 on line 2455 of perf-changelog.yaml.
|
i thought inferencex only does the most commonly served fp4 deepseekv4 pro tho |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25832050143 |
Summary
Apply the validated DeepSeek-V4-Pro MI355X recipe from vllm-project/recipes#433 (which in turn sources its serving config from vllm-project/vllm#40871) to the existing
dsv4-fp8-mi355x-vllmbenchmark. The base vLLM build (PR #40889 with AITER-accelerated sparse MLA decode, pinned indsv4_fp8_mi355x_vllm.sh) is unchanged; only serving flags / env vars / sweep range are updated.Server-side changes
VLLM_ROCM_USE_AITER_LINEAR1--distributed-executor-backendmp--max-num-batched-tokens8192--async-scheduling--gpu-memory-utilization0.900.6--max-num-seqs32128--tool-call-parser deepseek_v4--enable-auto-tool-choiceTool-call flags were removed because the recipe omits them and the throughput benchmark does not exercise tool calling. All other existing flags (
--kv-cache-dtype fp8,--moe-backend triton_unfused,--enforce-eager,--no-enable-prefix-caching,--tokenizer-mode deepseek_v4,--reasoning-parser deepseek_v4) are preserved.Sweep changes
dsv4-fp8-mi355x-vllmwas previously pinned toconc=1only (single-sequence marker). With--max-num-seqs=128validated by the recipe, the sweep is expanded toconc 4-64for both1k1kand8k1k, matchingdsv4-fp8-mi355x-sglangso vLLM↔SGLang results are directly comparable on the same MI355X runner.Validated locally:
Test plan
sweep-enabledlabel sorun-sweep.ymlexercises the new flags end-to-end at trimmed concurrency (highestconcper parallelism config).b3a4a44) still installs cleanly with the new server flags.agg_bmk.jsonfordsv4mi355x-vllm entries and compare throughput vs.dsv4-fp8-mi355x-sglangat matching concurrencies.