Fix VLLMServeLM ignoring a hyphenated tensor-parallel-size in model_kwargs#289
Merged
Conversation
VLLMServeLM auto-injects tensor_parallel_size = torch.cuda.device_count() when the key is absent, but it checked only the underscore spelling. Since the command builder maps "_" -> "-", "tensor_parallel_size" and "tensor-parallel-size" become the same --tensor-parallel-size flag, so a caller passing the hyphenated key slipped past the guard: the default was injected anyway, emitting a duplicate --tensor-parallel-size flag with device_count() silently winning (and breaking models without TP support on multi-GPU nodes). Skip the default injection when either spelling is present. Co-authored-by: Claude (Managed) <noreply@anthropic.com>
yuma-hirakawa
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
関連する Issue / PR
なし
目的
VLLMServeLMにmodel_kwargs={"tensor-parallel-size": 1}のようにハイフン区切りでtensor_parallel_sizeを渡すと、指定値が黙ってtorch.cuda.device_count()に上書きされる不具合を修正する。tensor_parallel_sizeがmodel_kwargsに含まれない場合、model_kwargs["tensor_parallel_size"] = device_count()が自動注入されるが、ユーザーが-を使うtensor-parallel-size(例えば1)を渡す場合、vllm serveには--tensor-parallel-size 1 --tensor-parallel-size <device_count()>が二重に渡り、argparse の後勝ちで ユーザー指定のtensor-parallel-sizeがサイレントに無視される。実装の詳細
VLLMServeLM.__init__の自動注入ガードを、tensor_parallel_sizeとtensor-parallel-sizeのどちらの表記も「指定済み」とみなすように修正する。tensor_parallel_sizeにtorch.cuda.device_count()を注入する動作確認
ruff check/ruff format --check通過追記事項
なし