Modify DSV4 GB300 SGLang MTP disagg concurrency by yhyang201 · Pull Request #1316 · SemiAnalysisAI/InferenceX

yhyang201 · 2026-05-12T15:23:12Z

Summary

Reduce concurrency for 4 mid-curve MTP disagg configs to match system capacity
- 1P 8D (dep4-dep8): 256 → 160
- 1P 16D (dep4-dep16): 256 → 160
- 2P 8D (dep4-dep8): 512 → 320
- 4P 8D (dep4-dep8): 1024 → 640
Set num_prompts_mult: 16 to keep total prompt count unchanged
Low-latency configs (1p1d conc=1, 1p6d conc=64) unchanged

Reduce concurrency for mid-curve MTP configs based on server log analysis of run #25642986639. Current values are 1.6-1.8x above system capacity, causing prefill queue buildup and high TTFT. - 1P 8D: 256 -> 160 - 1P 16D: 256 -> 160 - 2P 8D: 512 -> 320 - 4P 8D: 1024 -> 640 Set num_prompts_mult=16 to keep total prompt count unchanged (original: conc*10, new: conc*16).

github-actions · 2026-05-12T15:23:33Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

yhyang201 · 2026-05-12T15:24:13Z

/sweep

github-actions · 2026-05-12T15:24:31Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25744314802
Command: ``
Pinned ref: ba9e9c5
Approval: not required (trusted collaborator).

claude

LGTM — mechanical concurrency tuning for 4 mid-curve MTP disagg configs with clear rationale and offsetting num_prompts_mult to preserve total prompt count.

Extended reasoning...

Overview

The PR tunes 4 mid-curve MTP disagg concurrency points for the dsv4-fp4-gb300-dynamo-sglang-mtp config family across nvidia-master.yaml and 4 recipe YAMLs (1P 8D: 256→160, 1P 16D: 256→160, 2P 8D: 512→320, 4P 8D: 1024→640), and adds num_prompts_mult: 16 in each recipe so total prompt count is preserved. Low-latency configs are untouched.

Security risks

None. This is benchmark-recipe configuration data — YAML scalars consumed by the sweep harness. No code paths, auth, crypto, permissions, or runtime logic are affected.

Level of scrutiny

Low. The edits are purely numeric concurrency knobs with a clear empirical justification in the description (server-log analysis of run #25642986639 showing 1.6–1.8× over-subscription and P50 prefill queue of 130–230). The values change consistently across nvidia-master.yaml and each recipe, and num_prompts_mult: 16 is applied uniformly to keep workload size constant.

Other factors

The bug hunter flagged a nit about not appending a perf-changelog.yaml entry, which means the post-merge sweep won't auto-trigger to verify the TTFT improvement claim — this is already conveyed as an inline comment for the author to act on before merging, and doesn't impair correctness of the YAML edits themselves. Given the small, self-contained, mechanical nature of the change and the precedent set by PR #1305 / #1310, this can land without further human review.

claude · 2026-05-12T15:32:17Z

      # Mid curve 1p1d-dep4-dep8. 3 nodes.
      - spec-decoding: "mtp"
-        conc-list: [256]
+        conc-list: [160]


🟡 This PR retunes concurrency for four dsv4-fp4-gb300-dynamo-sglang-mtp mid-curve configs (in nvidia-master.yaml and the four recipe YAMLs) but does not append a corresponding entry to perf-changelog.yaml. Because .github/workflows/run-sweep.yml gates both its push and pull_request triggers on a path filter for perf-changelog.yaml, the post-merge sweep won't fire for this change and the TTFT improvement vs. run #25642986639 cited in the description can't be auto-recorded — appending a single entry with config-keys: dsv4-fp4-gb300-dynamo-sglang-mtp (matching the pattern of the analogous PR #1305) before merging will fix it.

Extended reasoning...

What the bug is. The PR changes concurrency points for dsv4-fp4-gb300-dynamo-sglang-mtp in .github/configs/nvidia-master.yaml (4 conc-list entries at lines 8477/8492/8507/8522 — 256→160, 256→160, 512→320, 1024→640) and the matching benchmark stanzas of the four mid-curve recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/. The PR description's stated goal is to verify a TTFT/E2E improvement against benchmark run #25642986639. However, perf-changelog.yaml is not touched (the PR file list is 5 files — nvidia-master.yaml plus the four recipes).\n\nWhy this matters — the sweep workflow gating. .github/workflows/run-sweep.yml declares its triggers as:\n\nyaml\non:\n push:\n branches: [main]\n paths:\n - "perf-changelog.yaml"\n pull_request:\n branches: [main]\n types: [ready_for_review, synchronize, labeled, unlabeled]\n paths:\n - "perf-changelog.yaml"\n\n\nBoth the push-to-main path and the pull_request path are gated on a paths filter that only matches perf-changelog.yaml. With no edit to that file, the workflow is not eligible to run at all on this PR or on the merge commit.\n\nProject convention reinforces this. AGENTS.md states explicitly: "Changes to perf-changelog.yaml trigger benchmark runs" (Git section) and "perf-changelog.yaml triggers which configs to benchmark" (YAML section). The "Adding a New Benchmark Configuration" checklist (step 2) directs maintainers to "Add corresponding entry to perf-changelog.yaml to trigger benchmark" — and although that step is written for new-config flows, the same gating applies to recipe retunes, which is why earlier tuning PRs added entries too.\n\nPrior precedent — PR #1305. The directly analogous PR "Fix B200 Dynamo vLLM recipe benchmark concurrencies" performed exactly this kind of operation (concurrency tuning in nvidia-master.yaml + recipe YAMLs) and appended a 7-line entry to perf-changelog.yaml with config-keys: dsv4-fp4-b200-dynamo-vllm to trigger the post-merge sweep. That entry is visible at the tail of the current perf-changelog.yaml. The current PR is the same pattern minus the changelog entry, so it falls outside the established workflow.\n\nStep-by-step proof that the sweep will not fire after merge.\n1. Merge commit lands on main. Modified paths: .github/configs/nvidia-master.yaml and four files under benchmarks/multi_node/srt-slurm-recipes/.../8k1k/.\n2. GitHub evaluates run-sweep.yml's on.push block. The paths filter requires at least one changed path to match perf-changelog.yaml.\n3. None of the five changed paths match. The workflow run is not created (no Run Sweep entry will appear in the Actions tab for this SHA).\n4. As a result, the TTFT/TPOT comparison against run #25642986639 that motivates the PR is never produced by the standard pipeline, and no changelog row links this PR for future bisection.\n\nImpact. This is a process/workflow omission, not a code defect — the YAML edits themselves are internally consistent. The consequence is loss of automated verification and changelog provenance for a perf-targeted change.\n\nHow to fix. Append a new entry to the END of perf-changelog.yaml (per AGENTS.md's chronological-append rule), mirroring PR #1305's shape, e.g.:\n\nyaml\n- config-keys:\n - dsv4-fp4-gb300-dynamo-sglang-mtp\n description:\n - "Tune mid-curve MTP disagg concurrencies (1p1d/2p1d/4p1d) to match measured system capacity"\n - "Reduce conc 256→160 (dep4-dep8, dep4-dep16), 512→320 (2p1d), 1024→640 (4p1d); keep num_prompts_mult=16 to preserve total prompt count"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1316\n\n\nMake sure the file still ends with a trailing newline (the check-newline job in run-sweep.yml enforces this on PRs).

yhyang201 · 2026-05-12T15:38:51Z

/sweep

github-actions · 2026-05-12T15:39:10Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25745147292
Command: ``
Pinned ref: ba9e9c5
Approval: not required (trusted collaborator).

Required to trigger the sweep workflow on merge.

github-actions · 2026-05-12T16:56:02Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25745868416
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25745868416

Enable DP-rank-aware routing to improve prefill queue distribution across DP ranks. Matches non-MTP disagg configs.

github-actions · 2026-05-12T20:40:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25753311297
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25753311297

github-actions · 2026-05-13T01:28:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25753311297
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25753311297

github-actions · 2026-05-13T02:09:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25753311297
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25753311297

github-actions · 2026-05-13T02:48:40Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25775064759
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25775064759

Replace default follow_bootstrap_room with round_robin to improve queue distribution across prefill DP ranks.

github-actions · 2026-05-13T02:49:54Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25775153188
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25775153188

Explore throughput-latency tradeoff with 3 conc points each: - 1P 8D/16D: 64x128x160 - 2P 8D: 128x256x320 - 4P 8D: 256x512x640

github-actions · 2026-05-13T02:57:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25775190340
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25775190340

github-actions · 2026-05-13T07:20:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25775454558
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25775454558

yhyang201 · 2026-05-13T10:21:34Z

/sweep

github-actions · 2026-05-13T10:21:52Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25793093112
Command: ``
Pinned ref: bcdb5bf
Approval: not required (trusted collaborator).

github-actions · 2026-05-13T10:25:35Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25793153134
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25793153134

github-actions · 2026-05-13T11:53:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25793192375
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25793192375

github-actions · 2026-05-13T16:55:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25807984297
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25807984297

- 1P8D: 160, 1P16D: 192, 2P8D: 384, 4P8D: 640 - Resolve perf-changelog.yaml merge conflict

github-actions · 2026-05-13T19:12:07Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25816160589
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25816160589

yhyang201 requested a review from a team May 12, 2026 15:23

yhyang201 requested review from jgangani and kedarpotdar-nv as code owners May 12, 2026 15:23

github-project-automation Bot added this to InferenceMAX Board May 12, 2026

claude Bot reviewed May 12, 2026

View reviewed changes

yhyang201 added the full-sweep-enabled label May 12, 2026

Add perf-changelog entry for MTP concurrency tuning

f518a23

Required to trigger the sweep workflow on merge.

Add frontend.args with KV routing to all MTP recipes

5ae9161

Enable DP-rank-aware routing to improve prefill queue distribution across DP ranks. Matches non-MTP disagg configs.

Add round_robin load balance for prefill DP ranks

2b265fa

Replace default follow_bootstrap_room with round_robin to improve queue distribution across prefill DP ranks.

yhyang201 force-pushed the tune-mtp-concurrency branch from a1ed80c to 2b265fa Compare May 13, 2026 02:49

Add multiple concurrency levels for mid-curve configs

62d4bc5

Explore throughput-latency tradeoff with 3 conc points each: - 1P 8D/16D: 64x128x160 - 2P 8D: 128x256x320 - 4P 8D: 256x512x640

yhyang201 added 2 commits May 13, 2026 15:52

Add higher conc sweep points for 1P16D and 2P8D

1f80fdc

Keep only max conc in conc-list for 1P16D and 2P8D

bcdb5bf

Merge branch 'main' into tune-mtp-concurrency

342ad5b

Update perf-changelog.yaml

d27b086

Drop conc=64 from 1P16D sweep to fit conc=192 in time budget

b824532

Finalize concurrency: keep only best conc per config

ce7a0b2

- 1P8D: 160, 1P16D: 192, 2P8D: 384, 4P8D: 640 - Resolve perf-changelog.yaml merge conflict

Conversation

yhyang201 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

yhyang201 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

yhyang201 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

yhyang201 commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yhyang201 commented May 12, 2026 •

edited

Loading