fix(provider): apply MergeSystemMessages for vLLM provider#3196
fix(provider): apply MergeSystemMessages for vLLM provider#3196tmchow wants to merge 1 commit intotailcallhq:mainfrom
Conversation
1b5ec06 to
6bb8c21
Compare
|
vLLM is a builtin provider |
6bb8c21 to
f91cfa8
Compare
|
Predicate now matches NVIDIA + vLLM (not OPENAI_COMPATIBLE). Updated pr desc. |
|
I guess I should use VLLM but for that I do not have PORT :) Let me open a new ticket for that. Just to put here, I have fixed that in my middleware as well // Qwen3 chat template requires system message at index 0 only.
// Merge any mid-conversation system messages into a single leading block.
const systems = req.body.messages.filter(m => m.role === 'system');
if (systems.length > 1 || (systems.length === 1 && req.body.messages[0].role !== 'system')) {
const others = req.body.messages.filter(m => m.role !== 'system');
const mergedContent = systems
.map(m => (typeof m.content === 'string' ? m.content : JSON.stringify(m.content)))
.filter(Boolean)
.join('\n\n');
req.body.messages = [{ role: 'system', content: mergedContent }, ...others];
}It just move all system messages to one place. |
f91cfa8 to
cc6be5b
Compare
vLLM is a built-in OpenAI-response provider in provider.json, but the `MergeSystemMessages` predicate in the openai pipeline only matched NVIDIA. vLLM rejects requests where the system message is not first (per tailcallhq#3128: 'System message must be at the beginning.'), so add vLLM to the same predicate using the existing `from_str` pattern that handles providers without a `ProviderId` constant. Closes tailcallhq#3128
cc6be5b to
34e0e3d
Compare
|
Thanks @YoyoSailer for the context (and the middleware snippet). Sounds like the vLLM path works for you with Rebased onto main in 34e0e3d to clear the merge-behind state. @amitksingh1490 @laststylebender14 anything else on this one? |
Summary
Apply
MergeSystemMessagesto the vLLM built-in provider in addition to NVIDIA. vLLM rejects requests where the system message is not first, which is exactly what this transformer was built to handle.Why this matters
Issue #3128 reports vLLM serving
mlx-community/Qwen3.6-27B-bf16rejects every request with:MergeSystemMessagesalready exists for this case. The transformer doc comment even says "Some providers (e.g. NVIDIA) reject requests with multiple system messages or system messages that are not positioned at the start". The "e.g." anticipated more cases, but the gating predicate atcrates/forge_app/src/dto/openai/transformers/pipeline.rs:86only matched NVIDIA.vLLM is a built-in provider (registered in
crates/forge_repo/src/provider/provider.jsonas id"vllm"), so it has its ownProviderIdand goes through the openai pipeline. The fix adds it to the same predicate.Changes
crates/forge_app/src/dto/openai/transformers/pipeline.rs: predicate now matchesProviderId::NVIDIAor the vLLM provider id (resolved viaProviderId::from_str("vllm")in the same style as the existingkimi_codingbinding a few lines below).crates/forge_app/src/dto/openai/transformers/ensure_system_first.rs: doc comment now lists vLLM alongside NVIDIA.The transformer is a no-op for already-compliant requests (single system message at the start), so no behavior changes for users whose vLLM server already gets correct input.
Testing
test_vllm_provider_merges_system_messagesasserts that a vLLM provider with mixed system messages produces a single merged system message at the front.cargo test -p forge_app: 696 passed, 0 failed.cargo clippy -p forge_app --all-targets --all-features -- -D warnings: clean.Closes #3128