Skip to content

Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe#1373

Closed
Oseltamivir wants to merge 3 commits into
mainfrom
dsv4-fp8-mi355x-vllm-recipe-433
Closed

Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe#1373
Oseltamivir wants to merge 3 commits into
mainfrom
dsv4-fp8-mi355x-vllm-recipe-433

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

Summary

Apply the validated DeepSeek-V4-Pro MI355X recipe from vllm-project/recipes#433 (which in turn sources its serving config from vllm-project/vllm#40871) to the existing dsv4-fp8-mi355x-vllm benchmark. The base vLLM build (PR #40889 with AITER-accelerated sparse MLA decode, pinned in dsv4_fp8_mi355x_vllm.sh) is unchanged; only serving flags / env vars / sweep range are updated.

Server-side changes

Before After (recipe #433)
VLLM_ROCM_USE_AITER_LINEAR unset 1
--distributed-executor-backend (default) mp
--max-num-batched-tokens (default) 8192
--async-scheduling off on
--gpu-memory-utilization 0.90 0.6
--max-num-seqs 32 128
--tool-call-parser deepseek_v4 on dropped
--enable-auto-tool-choice on dropped

Tool-call flags were removed because the recipe omits them and the throughput benchmark does not exercise tool calling. All other existing flags (--kv-cache-dtype fp8, --moe-backend triton_unfused, --enforce-eager, --no-enable-prefix-caching, --tokenizer-mode deepseek_v4, --reasoning-parser deepseek_v4) are preserved.

Sweep changes

dsv4-fp8-mi355x-vllm was previously pinned to conc=1 only (single-sequence marker). With --max-num-seqs=128 validated by the recipe, the sweep is expanded to conc 4-64 for both 1k1k and 8k1k, matching dsv4-fp8-mi355x-sglang so vLLM↔SGLang results are directly comparable on the same MI355X runner.

Validated locally:

$ python utils/matrix_logic/generate_sweep_configs.py full-sweep \
    --config-files .github/configs/amd-master.yaml \
    --framework vllm --runner-type mi355x
# 10 dsv4-fp8-mi355x-vllm configs generated (5 per ISL/OSL, tp=8, conc 4..64)

Test plan

  • Apply sweep-enabled label so run-sweep.yml exercises the new flags end-to-end at trimmed concurrency (highest conc per parallelism config).
  • Confirm the AITER MLA decode build (vLLM PR #40889 SHA b3a4a44) still installs cleanly with the new server flags.
  • Inspect agg_bmk.json for dsv4 mi355x-vllm entries and compare throughput vs. dsv4-fp8-mi355x-sglang at matching concurrencies.

Adopt the validated DeepSeek-V4-Pro MI355X (TP=8) settings from
vllm-project/recipes#433 for the existing AITER MLA decode benchmark:

* Add VLLM_ROCM_USE_AITER_LINEAR=1 env var
* Add --distributed-executor-backend mp, --max-num-batched-tokens 8192,
  --async-scheduling server flags
* Tune --gpu-memory-utilization 0.90 -> 0.6 and --max-num-seqs 32 -> 128
* Drop --tool-call-parser / --enable-auto-tool-choice (not in recipe,
  not exercised by these throughput benchmarks)
* Expand sweep from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang
  for vLLM<->SGLang comparability now that max-num-seqs=128 allows it
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Comment thread perf-changelog.yaml
- "Tune: --gpu-memory-utilization 0.90 -> 0.6, --max-num-seqs 32 -> 128"
- "Drop --tool-call-parser deepseek_v4 / --enable-auto-tool-choice (not in recipe; benchmark doesn't exercise tool calling)"
- "Expand search space from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang for vLLM<->SGLang comparability now that max-num-seqs=128 supports it"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1373
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry at line 2455 has an unfilled placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of pull/1373. Every other entry in the file uses a real PR number, so the link as-committed returns a 404 and breaks the changelog's audit trail. Trivial one-character fix: replace XXXX with 1373.

Extended reasoning...

What the bug is

The new entry appended to perf-changelog.yaml for the dsv4-fp8-mi355x-vllm recipe change ends with an unfilled template placeholder:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

This is on line 2455 of the committed file. The placeholder XXXX was never substituted with the actual PR number (1373).

How it manifests

Following the link in the changelog yields a GitHub 404 (no PR numbered XXXX exists in the repo). Any tooling that consumes perf-changelog.yaml to build a changelog, navigate from a recipe key back to its motivating PR, or audit which PR introduced a given configuration change will either error out (URL validation) or silently emit a dead link.

Why existing code doesn't prevent it

perf-changelog.yaml is hand-maintained — there is no automated check that pr-link resolves to a real PR. The PR description itself documents the intended link as pull/1373, and every other entry in the file uses a real numeric PR number (e.g. lines 2426, 2432, 2438, 2444 reference /pull/1329, /pull/1343, /pull/1344, /pull/1346). This is clearly an unfinished template — the author left XXXX as a fill-me-in marker and forgot to update it before pushing.

Note on what reviewers see

The PR-diff viewer renders the bottom of the file with the real PR number 1373 filled in (since GitHub knows the PR number when rendering), but git show dacf068 -- perf-changelog.yaml and reading the on-disk file directly both confirm the committed content is literally XXXX. Multiple independent verifiers confirmed this by reading the file and the git object directly.

Step-by-step proof

  1. The PR is Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe #1373 ("Improve dsv4-fp8-mi355x-vllm with [Do not merge] Add the Deepseek-V4-Pro supported on MI355x vllm-project/recipes#433 MI355X recipe"), and the PR description's link references pull/1373.
  2. The diff for perf-changelog.yaml appends a new entry whose final line is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  3. On the committed branch (git show dacf068:perf-changelog.yaml → line 2455), the value is the literal string XXXX, not a numeric PR id.
  4. Visiting https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX returns a 404 (GitHub rejects non-numeric PR identifiers).
  5. Every other pr-link entry in the file is a real numeric PR (e.g. /pull/1329, /pull/1344, /pull/1346), so this is the only entry that 404s.

Impact

No runtime effect — this is documentation/metadata only. But perf-changelog.yaml is the audit trail tying recipe-key changes back to the PRs that introduced them; a dead link defeats that purpose for this entry.

Fix

Replace XXXX with 1373 on line 2455 of perf-changelog.yaml.

@functionstackx
Copy link
Copy Markdown
Contributor

i thought inferencex only does the most commonly served fp4 deepseekv4 pro tho

@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants