Update dsr1-fp8-b300-sglang-mtp SGLang image to v0.5.11-cu130#1325
Update dsr1-fp8-b300-sglang-mtp SGLang image to v0.5.11-cu130#1325Klaud-Cold wants to merge 1 commit into
Conversation
…1154\n\nCo-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| # B200 SGLang MTP recipe as-is until B300-specific tuning is available. Image bumped | ||
| # to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. |
There was a problem hiding this comment.
🟡 The inline comment on lines 2568-2569 still reads Image bumped to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. but this PR bumps the image to v0.5.11-cu130. The comment is now doubly stale: it pins a version that no longer matches the image directly below it, and the claim that this image matches "the standard B300 SGLang image used by other B300 configs" is also no longer true since every other B300 SGLang config in this file is still on v0.5.10.post1-cu130. Either bump the comment to v0.5.11-cu130 or rephrase it to not pin a specific version.
Extended reasoning...
What the bug is
The inline comment immediately above the dsr1-fp8-b300-sglang-mtp block contains a literal version pin that has fallen out of sync with the image it describes. After this PR, lines 2566-2571 of .github/configs/nvidia-master.yaml read:
# B200 SGLang MTP recipe as-is until B300-specific tuning is available. Image bumped
# to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs.
dsr1-fp8-b300-sglang-mtp:
image: lmsysorg/sglang:v0.5.11-cu130The comment says the image was bumped to v0.5.10.post1-cu130, but the actual image one line below is v0.5.11-cu130. The two are now in direct contradiction.
Why the comment is doubly stale
The comment is wrong in two ways, not just one:
- Version drift — the literal string
v0.5.10.post1-cu130in the comment no longer matches the actualimage:value (v0.5.11-cu130). - Claim drift — the comment also asserts that this image "match[es] the standard B300 SGLang image used by other B300 configs." A grep over
.github/configs/nvidia-master.yamlshows every other B300 SGLang config (dsr1-fp4-b300-sglang,dsr1-fp8-b300-sglang,glm5-fp8-b300-sglang,glm5-fp4-b300-sglang, and the qwen3.5 B300 SGLang configs) is still pinned tolmsysorg/sglang:v0.5.10.post1-cu130. So after this PR,dsr1-fp8-b300-sglang-mtpis the only B300 SGLang config on v0.5.11-cu130, which is the opposite of what the comment claims.
Step-by-step proof
- Open
.github/configs/nvidia-master.yamlpost-merge. - Read lines 2566-2571: the comment ends
Image bumped to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs.and the immediately followingimage:line islmsysorg/sglang:v0.5.11-cu130. → comment contradicts the code it documents. - Grep the same file for
v0.5.10.post1-cu130— every other B300 SGLang config still uses that tag. → the "matches the standard B300 SGLang image" claim is now also false.
Impact
Documentation hygiene only — there is no runtime effect, the YAML key/value the runner reads is unchanged in form, and CI does not parse comments. The harm is purely to future readers of this file: anyone trying to understand why dsr1-fp8-b300-sglang-mtp is pinned to a different version than its siblings will read a comment that asserts the exact opposite of the truth.
How to fix
Two reasonable options:
- Bump the literal version in the comment to
v0.5.11-cu130and drop or rewrite the "matches the standard B300 SGLang image used by other B300 configs" clause, since after this PR it no longer matches them — e.g., reframe as a one-off upgrade ahead of the rest of the B300 SGLang configs. - Rephrase the comment so it does not pin any specific tag, which avoids this kind of drift on the next bump.
Severity is nit — the comment is misleading but does not change runtime behavior, and the PR diff happens to be the one place where bumping the comment costs nothing.
| - dsr1-fp8-b300-sglang-mtp | ||
| description: | ||
| - "Update SGLang image from v0.5.10.post1-cu130 to v0.5.11-cu130" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX |
There was a problem hiding this comment.
🟡 The new perf-changelog entry's pr-link contains a literal XXX placeholder instead of the actual PR number (1325), producing a broken/404 link. Replace XXX with 1325 so the entry matches the rest of the file (e.g. /pull/1310).
Extended reasoning...
What the bug is
The newly added perf-changelog entry on line 2351 of perf-changelog.yaml declares its pr-link as:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXThe XXX is a literal placeholder that was never substituted with the actual PR number. Every other entry in this file uses a real PR number — e.g. /pull/1304, /pull/1305, /pull/1308, /pull/1310 immediately above this new entry — so the pr-link field is clearly meant to be an actual, resolvable GitHub URL pointing back to the PR that introduced the change.
Code path that triggers it
The diff in perf-changelog.yaml directly adds:
- config-keys:
- dsr1-fp8-b300-sglang-mtp
description:
- "Update SGLang image from v0.5.10.post1-cu130 to v0.5.11-cu130"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXThis PR is #1325 (per the PR metadata), so the correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1325.
Why existing code doesn't prevent it
There is no schema validation or CI check that enforces the pr-link field is a numeric, resolvable URL — the value is treated as a free-form string. As a result, the placeholder slipped through.
Impact
When a reader (or downstream tooling that surfaces the changelog) follows the link from this entry, they'll land on a 404 page instead of the PR that introduced the change. This defeats the entire purpose of the pr-link traceability field, which exists to connect performance-changelog entries back to their originating PRs for review and historical context.
How to fix
Replace XXX with 1325 on line 2351:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1325Step-by-step proof
- Open
perf-changelog.yamlat line 2351 — the value ofpr-linkis literallyhttps://github.com/SemiAnalysisAI/InferenceX/pull/XXX. - Look at the preceding entry (line ending around 2345):
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1310— a real PR number. - Look at the PR metadata: this is PR Update dsr1-fp8-b300-sglang-mtp SGLang image to v0.5.11-cu130 #1325.
- Visiting
https://github.com/SemiAnalysisAI/InferenceX/pull/XXXin a browser returns a 404 becauseXXXis not a valid PR identifier. - The intended URL —
https://github.com/SemiAnalysisAI/InferenceX/pull/1325— resolves to this PR.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25763413528 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25763413528 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25763413528 |
Summary
dsr1-fp8-b300-sglang-mtpimage fromlmsysorg/sglang:v0.5.10.post1-cu130tolmsysorg/sglang:v0.5.11-cu130Ref #1154
Generated with Claude Code