(Replaced by #1354) Update MiniMax M2.5 FP8 H200 vLLM agg recipes by anish-shanbhag · Pull Request #1298 · SemiAnalysisAI/InferenceX

anish-shanbhag · 2026-05-07T23:02:20Z

Update MiniMax-M2.5 FP8 H200 vLLM to vllm/vllm-openai:v0.20.1-ubuntu2404

Set vLLM serving knobs in benchmarks/single_node/minimaxm2.5_fp8_h200.sh: generated benchmark max-model-len, previous eval max-model-len handling, fp8 KV cache, FlashInfer attention/autotune, Triton MoE, and MiniMax QK norm fusion.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

functionstackx

revert github.token change

functionstackx · 2026-05-12T18:11:46Z


      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
        with:
-          token: ${{ secrets.REPO_PAT }}


@anish-shanbhag can u revert this plz

Sorry @functionstackx, CI was failing here since the source branch is from a fork; I've opened #1354 to supersede this PR

anish-shanbhag · 2026-05-12T22:56:27Z

Closing in favor of #1354 which no longer uses a fork branch

github-project-automation Bot added this to InferenceMAX Board May 7, 2026

anish-shanbhag marked this pull request as ready for review May 8, 2026 23:48

anish-shanbhag requested a review from a team May 8, 2026 23:48

anish-shanbhag requested review from jgangani and kedarpotdar-nv as code owners May 8, 2026 23:48

claude Bot reviewed May 8, 2026

View reviewed changes

anish-shanbhag force-pushed the ashan/port-inferencemax-53-minimax-h200-no-slurm-shared branch 3 times, most recently from e5688c8 to 8b72d09 Compare May 12, 2026 01:06

Tune MiniMax M2.5 FP8 H200 vLLM agg

3dea91d

anish-shanbhag force-pushed the ashan/port-inferencemax-53-minimax-h200-no-slurm-shared branch from 8b72d09 to 3dea91d Compare May 12, 2026 01:08

kedarpotdar-nv approved these changes May 12, 2026

View reviewed changes

kedarpotdar-nv added NVIDIA full-sweep-enabled labels May 12, 2026

anish-shanbhag force-pushed the ashan/port-inferencemax-53-minimax-h200-no-slurm-shared branch from b66321f to 3dea91d Compare May 12, 2026 17:56

Pass github.token so that CI can be triggered from fork branches

72a0790

functionstackx requested changes May 12, 2026

View reviewed changes

anish-shanbhag mentioned this pull request May 12, 2026

Update MiniMax M2.5 FP8 H200 vLLM agg recipes #1354

Open

anish-shanbhag closed this May 12, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 12, 2026

anish-shanbhag changed the title ~~Update MiniMax M2.5 FP8 H200 vLLM agg recipes~~ (Replaced by #1354) Update MiniMax M2.5 FP8 H200 vLLM agg recipes May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Replaced by #1354) Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1298

(Replaced by #1354) Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1298
anish-shanbhag wants to merge 2 commits into
SemiAnalysisAI:mainfrom
anish-shanbhag:ashan/port-inferencemax-53-minimax-h200-no-slurm-shared

anish-shanbhag commented May 7, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

functionstackx left a comment

Uh oh!

functionstackx May 12, 2026

Uh oh!

anish-shanbhag May 12, 2026

Uh oh!

anish-shanbhag commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anish-shanbhag commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx May 12, 2026

Choose a reason for hiding this comment

Uh oh!

anish-shanbhag May 12, 2026

Choose a reason for hiding this comment

Uh oh!

anish-shanbhag commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anish-shanbhag commented May 7, 2026 •

edited

Loading