Skip to content

Modify DSV4 GB300 SGLang MTP disagg concurrency#1316

Open
yhyang201 wants to merge 11 commits into
mainfrom
tune-mtp-concurrency
Open

Modify DSV4 GB300 SGLang MTP disagg concurrency#1316
yhyang201 wants to merge 11 commits into
mainfrom
tune-mtp-concurrency

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

@yhyang201 yhyang201 commented May 12, 2026

Summary

  • Reduce concurrency for 4 mid-curve MTP disagg configs to match system capacity
    • 1P 8D (dep4-dep8): 256 → 160
    • 1P 16D (dep4-dep16): 256 → 160
    • 2P 8D (dep4-dep8): 512 → 320
    • 4P 8D (dep4-dep8): 1024 → 640
  • Set num_prompts_mult: 16 to keep total prompt count unchanged
  • Low-latency configs (1p1d conc=1, 1p6d conc=64) unchanged

Reduce concurrency for mid-curve MTP configs based on server log
analysis of run #25642986639. Current values are 1.6-1.8x above
system capacity, causing prefill queue buildup and high TTFT.

- 1P 8D:  256 -> 160
- 1P 16D: 256 -> 160
- 2P 8D:  512 -> 320
- 4P 8D:  1024 -> 640

Set num_prompts_mult=16 to keep total prompt count unchanged
(original: conc*10, new: conc*16).
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25744314802
Command: ``
Pinned ref: ba9e9c5
Approval: not required (trusted collaborator).

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical concurrency tuning for 4 mid-curve MTP disagg configs with clear rationale and offsetting num_prompts_mult to preserve total prompt count.

Extended reasoning...

Overview

The PR tunes 4 mid-curve MTP disagg concurrency points for the dsv4-fp4-gb300-dynamo-sglang-mtp config family across nvidia-master.yaml and 4 recipe YAMLs (1P 8D: 256→160, 1P 16D: 256→160, 2P 8D: 512→320, 4P 8D: 1024→640), and adds num_prompts_mult: 16 in each recipe so total prompt count is preserved. Low-latency configs are untouched.

Security risks

None. This is benchmark-recipe configuration data — YAML scalars consumed by the sweep harness. No code paths, auth, crypto, permissions, or runtime logic are affected.

Level of scrutiny

Low. The edits are purely numeric concurrency knobs with a clear empirical justification in the description (server-log analysis of run #25642986639 showing 1.6–1.8× over-subscription and P50 prefill queue of 130–230). The values change consistently across nvidia-master.yaml and each recipe, and num_prompts_mult: 16 is applied uniformly to keep workload size constant.

Other factors

The bug hunter flagged a nit about not appending a perf-changelog.yaml entry, which means the post-merge sweep won't auto-trigger to verify the TTFT improvement claim — this is already conveyed as an inline comment for the author to act on before merging, and doesn't impair correctness of the YAML edits themselves. Given the small, self-contained, mechanical nature of the change and the precedent set by PR #1305 / #1310, this can land without further human review.

# Mid curve 1p1d-dep4-dep8. 3 nodes.
- spec-decoding: "mtp"
conc-list: [256]
conc-list: [160]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 This PR retunes concurrency for four dsv4-fp4-gb300-dynamo-sglang-mtp mid-curve configs (in nvidia-master.yaml and the four recipe YAMLs) but does not append a corresponding entry to perf-changelog.yaml. Because .github/workflows/run-sweep.yml gates both its push and pull_request triggers on a path filter for perf-changelog.yaml, the post-merge sweep won't fire for this change and the TTFT improvement vs. run #25642986639 cited in the description can't be auto-recorded — appending a single entry with config-keys: dsv4-fp4-gb300-dynamo-sglang-mtp (matching the pattern of the analogous PR #1305) before merging will fix it.

Extended reasoning...

What the bug is. The PR changes concurrency points for dsv4-fp4-gb300-dynamo-sglang-mtp in .github/configs/nvidia-master.yaml (4 conc-list entries at lines 8477/8492/8507/8522 — 256→160, 256→160, 512→320, 1024→640) and the matching benchmark stanzas of the four mid-curve recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/. The PR description's stated goal is to verify a TTFT/E2E improvement against benchmark run #25642986639. However, perf-changelog.yaml is not touched (the PR file list is 5 files — nvidia-master.yaml plus the four recipes).\n\nWhy this matters — the sweep workflow gating. .github/workflows/run-sweep.yml declares its triggers as:\n\nyaml\non:\n push:\n branches: [main]\n paths:\n - "perf-changelog.yaml"\n pull_request:\n branches: [main]\n types: [ready_for_review, synchronize, labeled, unlabeled]\n paths:\n - "perf-changelog.yaml"\n\n\nBoth the push-to-main path and the pull_request path are gated on a paths filter that only matches perf-changelog.yaml. With no edit to that file, the workflow is not eligible to run at all on this PR or on the merge commit.\n\nProject convention reinforces this. AGENTS.md states explicitly: "Changes to perf-changelog.yaml trigger benchmark runs" (Git section) and "perf-changelog.yaml triggers which configs to benchmark" (YAML section). The "Adding a New Benchmark Configuration" checklist (step 2) directs maintainers to "Add corresponding entry to perf-changelog.yaml to trigger benchmark" — and although that step is written for new-config flows, the same gating applies to recipe retunes, which is why earlier tuning PRs added entries too.\n\nPrior precedent — PR #1305. The directly analogous PR "Fix B200 Dynamo vLLM recipe benchmark concurrencies" performed exactly this kind of operation (concurrency tuning in nvidia-master.yaml + recipe YAMLs) and appended a 7-line entry to perf-changelog.yaml with config-keys: dsv4-fp4-b200-dynamo-vllm to trigger the post-merge sweep. That entry is visible at the tail of the current perf-changelog.yaml. The current PR is the same pattern minus the changelog entry, so it falls outside the established workflow.\n\nStep-by-step proof that the sweep will not fire after merge.\n1. Merge commit lands on main. Modified paths: .github/configs/nvidia-master.yaml and four files under benchmarks/multi_node/srt-slurm-recipes/.../8k1k/.\n2. GitHub evaluates run-sweep.yml's on.push block. The paths filter requires at least one changed path to match perf-changelog.yaml.\n3. None of the five changed paths match. The workflow run is not created (no Run Sweep entry will appear in the Actions tab for this SHA).\n4. As a result, the TTFT/TPOT comparison against run #25642986639 that motivates the PR is never produced by the standard pipeline, and no changelog row links this PR for future bisection.\n\nImpact. This is a process/workflow omission, not a code defect — the YAML edits themselves are internally consistent. The consequence is loss of automated verification and changelog provenance for a perf-targeted change.\n\nHow to fix. Append a new entry to the END of perf-changelog.yaml (per AGENTS.md's chronological-append rule), mirroring PR #1305's shape, e.g.:\n\nyaml\n- config-keys:\n - dsv4-fp4-gb300-dynamo-sglang-mtp\n description:\n - "Tune mid-curve MTP disagg concurrencies (1p1d/2p1d/4p1d) to match measured system capacity"\n - "Reduce conc 256→160 (dep4-dep8, dep4-dep16), 512→320 (2p1d), 1024→640 (4p1d); keep num_prompts_mult=16 to preserve total prompt count"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1316\n\n\nMake sure the file still ends with a trailing newline (the check-newline job in run-sweep.yml enforces this on PRs).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25745147292
Command: ``
Pinned ref: ba9e9c5
Approval: not required (trusted collaborator).

Required to trigger the sweep workflow on merge.
@github-actions
Copy link
Copy Markdown
Contributor

Enable DP-rank-aware routing to improve prefill queue
distribution across DP ranks. Matches non-MTP disagg configs.
@github-actions
Copy link
Copy Markdown
Contributor

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Replace default follow_bootstrap_room with round_robin to improve
queue distribution across prefill DP ranks.
@yhyang201 yhyang201 force-pushed the tune-mtp-concurrency branch from a1ed80c to 2b265fa Compare May 13, 2026 02:49
@github-actions
Copy link
Copy Markdown
Contributor

Explore throughput-latency tradeoff with 3 conc points each:
- 1P 8D/16D: 64x128x160
- 2P 8D: 128x256x320
- 4P 8D: 256x512x640
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25793093112
Command: ``
Pinned ref: bcdb5bf
Approval: not required (trusted collaborator).

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

- 1P8D: 160, 1P16D: 192, 2P8D: 384, 4P8D: 640
- Resolve perf-changelog.yaml merge conflict
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant