fix(provider): apply MergeSystemMessages for vLLM provider by tmchow · Pull Request #3196 · tailcallhq/forgecode

tmchow · 2026-04-29T00:57:48Z

Summary

Apply MergeSystemMessages to the vLLM built-in provider in addition to NVIDIA. vLLM rejects requests where the system message is not first, which is exactly what this transformer was built to handle.

Why this matters

Issue #3128 reports vLLM serving mlx-community/Qwen3.6-27B-bf16 rejects every request with:

ERROR: POST http://1.2.3.4:8000/v1/chat/completions
Caused by: 404 Not Found Reason: {"error": "System message must be at the beginning."}

MergeSystemMessages already exists for this case. The transformer doc comment even says "Some providers (e.g. NVIDIA) reject requests with multiple system messages or system messages that are not positioned at the start". The "e.g." anticipated more cases, but the gating predicate at crates/forge_app/src/dto/openai/transformers/pipeline.rs:86 only matched NVIDIA.

vLLM is a built-in provider (registered in crates/forge_repo/src/provider/provider.json as id "vllm"), so it has its own ProviderId and goes through the openai pipeline. The fix adds it to the same predicate.

Changes

crates/forge_app/src/dto/openai/transformers/pipeline.rs: predicate now matches ProviderId::NVIDIA or the vLLM provider id (resolved via ProviderId::from_str("vllm") in the same style as the existing kimi_coding binding a few lines below).
crates/forge_app/src/dto/openai/transformers/ensure_system_first.rs: doc comment now lists vLLM alongside NVIDIA.

The transformer is a no-op for already-compliant requests (single system message at the start), so no behavior changes for users whose vLLM server already gets correct input.

Testing

New pipeline-level test test_vllm_provider_merges_system_messages asserts that a vLLM provider with mixed system messages produces a single merged system message at the front.
cargo test -p forge_app: 696 passed, 0 failed.
cargo clippy -p forge_app --all-targets --all-features -- -D warnings: clean.

Closes #3128

CLAassistant · 2026-04-29T00:57:55Z

All committers have signed the CLA.

amitksingh1490 · 2026-04-29T06:01:22Z

vLLM is a builtin provider

tmchow · 2026-04-29T15:44:21Z

Predicate now matches NVIDIA + vLLM (not OPENAI_COMPATIBLE). Updated pr desc.

YoyoSailer · 2026-05-02T03:33:56Z

Just jumping in - I have a problem with this approach as it is hardcoded to have vllm. I have hosted it behind nginx proxy and have a little middlewear and accessing it from a URL and AUTH so i used OpenAI Compatible.

~~My vLLM worked fine on other tools.~~

I guess I should use VLLM but for that I do not have PORT :) Let me open a new ticket for that.

Just to put here, I have fixed that in my middleware as well

// Qwen3 chat template requires system message at index 0 only.
// Merge any mid-conversation system messages into a single leading block.
  const systems = req.body.messages.filter(m => m.role === 'system');
  if (systems.length > 1 || (systems.length === 1 && req.body.messages[0].role !== 'system')) {
    const others = req.body.messages.filter(m => m.role !== 'system');
    const mergedContent = systems
      .map(m => (typeof m.content === 'string' ? m.content : JSON.stringify(m.content)))
      .filter(Boolean)
      .join('\n\n');
    req.body.messages = [{ role: 'system', content: mergedContent }, ...others];
  }

It just move all system messages to one place.

vLLM is a built-in OpenAI-response provider in provider.json, but the `MergeSystemMessages` predicate in the openai pipeline only matched NVIDIA. vLLM rejects requests where the system message is not first (per tailcallhq#3128: 'System message must be at the beginning.'), so add vLLM to the same predicate using the existing `from_str` pattern that handles providers without a `ProviderId` constant. Closes tailcallhq#3128

tmchow · 2026-05-11T00:33:44Z

Thanks @YoyoSailer for the context (and the middleware snippet). Sounds like the vLLM path works for you with VLLM_PORT now that #3258 landed.

Rebased onto main in 34e0e3d to clear the merge-behind state. @amitksingh1490 @laststylebender14 anything else on this one?

github-actions Bot added the type: fix Iterations on existing features or infrastructure. label Apr 29, 2026

tmchow mentioned this pull request Apr 29, 2026

[Bug]: System message must be at the beginning. when using vllm #3128

Closed

tmchow force-pushed the fix/3128-vllm-system-message-ordering branch from 1b5ec06 to 6bb8c21 Compare April 29, 2026 02:49

amitksingh1490 reviewed Apr 29, 2026

View reviewed changes

Comment thread crates/forge_app/src/dto/openai/transformers/pipeline.rs Outdated

tmchow force-pushed the fix/3128-vllm-system-message-ordering branch from 6bb8c21 to f91cfa8 Compare April 29, 2026 15:29

tmchow changed the title ~~fix(provider): apply MergeSystemMessages for OPENAI_COMPATIBLE custom providers~~ fix(provider): apply MergeSystemMessages for vLLM provider Apr 29, 2026

YoyoSailer mentioned this pull request May 2, 2026

[Bug]: vLLM - Can we have PORT as optional #3238

Closed

2 tasks

tmchow force-pushed the fix/3128-vllm-system-message-ordering branch from f91cfa8 to cc6be5b Compare May 5, 2026 03:17

tmchow force-pushed the fix/3128-vllm-system-message-ordering branch from cc6be5b to 34e0e3d Compare May 11, 2026 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider): apply MergeSystemMessages for vLLM provider#3196

fix(provider): apply MergeSystemMessages for vLLM provider#3196
tmchow wants to merge 1 commit intotailcallhq:mainfrom
tmchow:fix/3128-vllm-system-message-ordering

tmchow commented Apr 29, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

amitksingh1490 commented Apr 29, 2026

Uh oh!

tmchow commented Apr 29, 2026

Uh oh!

YoyoSailer commented May 2, 2026 •

edited

Loading

Uh oh!

tmchow commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tmchow commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Changes

Testing

Uh oh!

CLAassistant commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

amitksingh1490 commented Apr 29, 2026

Uh oh!

tmchow commented Apr 29, 2026

Uh oh!

YoyoSailer commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmchow commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tmchow commented Apr 29, 2026 •

edited

Loading

CLAassistant commented Apr 29, 2026 •

edited

Loading

YoyoSailer commented May 2, 2026 •

edited

Loading