Skip to content

feat(aliases): multi-target routing alias with per-target rules#113

Open
Menci wants to merge 174 commits into
mainfrom
floway/model-aliases
Open

feat(aliases): multi-target routing alias with per-target rules#113
Menci wants to merge 174 commits into
mainfrom
floway/model-aliases

Conversation

@Menci

@Menci Menci commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

  • New model_aliases D1 table + seed (codex-auto-reviewgpt-5.4 with reasoning.effort=low, real-only). Aliases declare a target model id, an optional upstream filter, an operator-locked rule set, a conflict mode (alias-only / real-only / both-real-first / both-alias-first), and whether the entry appears in /v1/models.
  • Floway-extension fields on every inbound IR (thinking_budget, adaptive_thinking, reasoning_summary, anthropic_speed, anthropic_beta, verbosity, serviceTier, etc.) so the same closed set of mode knobs is sayable in any inbound shape. The translate layer reads them and emits the upstream wire's natural slot for each pair; whichever slots don't apply get dropped and traced by the per-upstream sanitizer just before the upstream HTTP call.
  • applyAliasRulesTo<InboundProtocol> runs at the serve layer once the matched candidate is picked, writing alias rule values onto the inbound IR (native slot when the protocol supports the concept, extension slot otherwise). Alias values override user-supplied values — operator-locked semantics per the design's Goal 3.
  • /v1/models (and the Anthropic /models and Gemini /v1beta/models surfaces) emits one entry per (upstream, addressable form) pair that can resolve the alias's target, with an aliasedFrom Floway extension carrying { targetModelId, upstreamIds, rules, onConflict }. Aliases marked visibleInModelsList: false are omitted; aliases whose target is unreachable from any upstream produce zero entries.
  • x-floway-alias response header carries the matched alias name on every call served via an alias. Stamped via a shared stageGatewayResponseHeader helper that writes both Hono's c.header (for streamSSE-built responses) and the per-ctx responseHeaders bag (for non-streaming Response.json-built responses), so the header rides out uniformly across streaming, non-streaming, chat, and passthrough paths.

Decisions for reviewer attention

  • Summary-only synthesis when targeting Anthropic Messages. When a Chat-Completions inbound request carries only the reasoning_summary Floway extension (no effort, budget, or adaptive) and lands on a Messages upstream, the translate layer synthesizes thinking.{type:"enabled", display:<mapped>} rather than leaving thinking unset. Rationale: Anthropic's wire discards display without type, so a strict mapping would silently drop the operator's summary intent. Native fields on Responses (reasoning.summary) and Gemini (thinkingConfig.includeThoughts) keep their pre-extension *-via-messages translation contracts — the extension synthesis applies only on the new extension paths.
  • verbosity / text.verbosity added as native fields. The spec described them as native but they were absent from the protocol IR types. Task 3 added them as proper native fields (not Floway extensions), matching the OpenAI GPT-5-family spec.
  • Pass-through widening for service_tier / reasoning.summary. MessagesPayload.service_tier and ResponsesPayload.reasoning.summary were widened to admit (string & {}) so operator-typed freeform values pass through without TypeScript narrowing. Matches Goal 2 (no enum gating in Floway).
  • model_aliases.display_name column + per-upstream prefixed /v1/models enumeration. Migration 0047 adds an optional operator-set display_name. Each alias listing entry is composed as ${upstream.displayName}: ${alias.displayName} when set, or ${upstream.displayName}: ${target.displayName}${rules summary} otherwise. The rules summary joins per-field tokens (high effort, 4096tk reasoning, adaptive reasoning, concise summary, low verbosity, priority tier, fast speed, sorted anthropicBeta/...) in parentheses. The seed codex-auto-review row gets display_name='Codex Auto Review'. Helpers live in control-plane/model-aliases/display.ts and the shared per-upstream emission iterator in data-plane/models/alias-listing.ts.

Follow-up fixes (this wave)

  • Restore pre-extension native field translation. responses-via-messages and gemini-via-messages no longer translate the Responses-native reasoning.summary, Responses-native service_tier, or Gemini-native thinkingConfig.includeThoughts onto Anthropic Messages — those pairs keep their pre-Task-3 contracts. Extension-field paths (thinking_budget, adaptive_thinking, anthropic_speed, etc.) are unchanged.
  • Drop the /v1/responses entry-level codex-auto-review shim. rewriteResponsesEntryModelAlias is gone. The seed alias in the model_aliases table now handles this case through the normal alias-resolution path on every inbound surface. On a Codex upstream that exposes a real codex-auto-review model, on_conflict='real-only' lets the real id win and the effort=low rule will not apply — Codex CLI callers that needed the shim's behavior must set effort=low themselves or operators can re-seed the row with both-alias-first.
  • display_name + per-upstream prefixed enumeration. Described in the decisions section above; the listing change applies to both /v1/models and the Gemini /v1beta/models surface.

Earlier fix wave (final review address)

Test plan

  • applyAliasRulesTo<Protocol> writes effort / budget / adaptive / summary / verbosity / serviceTier / anthropicSpeed / anthropicBeta to the right slot for each of the four chat protocols (30 cases in apply_test.ts).
  • Each onConflict mode emits the expected interpretations (alias-only, real-only, both-real-first, both-alias-first) and the real-only post-resolution prune drops the alias-rewrite half when the real-name also resolves (covered in registry_test.ts).
  • /v1/models emits an alias entry on each (upstream, listed form) pair that can resolve its target (new test asserts both codex-auto-review and azure/codex-auto-review on a dual-listed Azure).
  • /v1/models uses ${upstream}: ${alias.displayName} when display_name is set (suppressing the rules summary) and ${upstream}: ${target.displayName} (...rules) when it's not.
  • /v1/models honours alias upstreamIds — restricting to one upstream emits one row, not one per upstream that happens to expose the target.
  • /v1/models omits aliases marked visibleInModelsList: false.
  • /v1/models omits an alias whose target is not in any reachable upstream catalog (per-upstream rule — there is no surface form to attach the row to).
  • /v1beta/models appends visible aliases as synthetic Gemini model entries with the composed ${upstream}: ... displayName.
  • x-floway-alias response header is set on an alias-matched request and absent on a real-routed request (Messages http test).
  • x-floway-alias rides out on /v1/embeddings, /v1/images/generations, and /v1/completions when the request hits an aliased model (passthrough tests).
  • Alias reasoning.effort lands on output_config.effort of the upstream Messages body (Messages http test).
  • Per-upstream sanitizer strips every Floway-extension field and emits one trace line per drop (sanitize tests + extension-translate coverage across the nine translate pairs).
  • Storage seed: a freshly migrated DB returns the codex-auto-review row with real-only, reasoning.effort=low, and displayName='Codex Auto Review', plus a wall-clock createdAt (repo tests).
  • loadAllAliases reads display_name, returning it as displayName when set and omitting the field when SQL stored NULL.
  • formatAliasRulesSummary / composeAliasDisplayName unit tests cover each rule field, empty rules, sorted anthropicBeta, alias-displayName-suppresses-rules, and the missing-displayName fallback.
  • Full workspace verification: pnpm run typecheck && pnpm run lint && pnpm run test — 3485 tests pass.

Migration

Before deploying, run the migration and the deploy in one go (idempotent on re-run):

pnpm run db:migrate:remote && pnpm run deploy

The model_aliases table is empty after migration except for the seed codex-auto-review row. Migration 0047 adds the optional display_name column and backfills the seed row's label. Operators populate the table via wrangler d1 execute direct inserts until the CRUD UI lands.

Out of scope

  • Dashboard UI for alias management.
  • REST CRUD endpoints for aliases (POST /api/aliases, etc.). They will share schema and validation with the future UI; building them speculatively wastes work.
  • Regex / wildcard aliases. The current matcher is literal-equality.
  • Per-model suggestion data flowing into a combobox. Depends on the in-flight catalog reasoning fields and will land with the UI.

Menci added 30 commits June 25, 2026 20:40
Introduces the storage layer for the model-aliases data-plane feature.
The table is global, primary-keyed by alias name. Conflict resolution
is encoded as a CHECK-constrained TEXT column, freeform rule values
are stored as JSON, and the codex-auto-review seed entry lands with
the table.

loadAllAliases reads the full table per request (the table is
operator-managed and small; a cache layer is unnecessary for v0).
Each inbound protocol IR gains the closed set of mode-knob fields it
cannot natively express (thinking_budget, adaptive_thinking,
reasoning_summary on chat-completions; thinking_budget, adaptive_thinking
on responses; verbosity on messages; verbosity, serviceTier inside
generationConfig on gemini; anthropic_speed/anthropicSpeed and
anthropic_beta/anthropicBeta everywhere they apply).

The extensions are public — a client can set them directly and they
behave identically to alias-injected rules. The per-upstream sanitizer
strips any extension residue before the upstream call and emits one
log line per drop when given a trace context, so cross-protocol drops
are observable without leaking the field to upstream.
Each translate pair now reads the inbound IR's native and Floway-extension
mode-knob fields and writes them to the upstream protocol's natural slot
per the model-aliases design table. Routing is purely by upstream wire
protocol; translate never branches on model version.

Coverage per rule:
- reasoning.effort: emitted onto OpenAI Chat reasoning_effort, Responses
  reasoning.effort, Anthropic output_config.effort, Gemini
  thinkingConfig.thinkingLevel (the inverse mappers stay where they were).
- reasoning.budgetTokens / reasoning.adaptive: emitted onto Anthropic
  thinking.{type:'enabled', budget_tokens} and thinking.{type:'adaptive'}
  via a shared via-messages helper; Gemini path keeps its native
  thinkingBudget handling.
- reasoning.summary: bidirectional Responses reasoning.summary ↔ Anthropic
  thinking.display mapping with concise|detailed → summarized, omitted →
  omitted, auto → upstream default; reverse picks concise as the
  Responses-side canonical form.
- verbosity: native fields on Chat and Responses (added now — the IR
  did not carry them yet), Floway extension on Messages and Gemini.
- serviceTier: passes through verbatim onto each protocol's service_tier
  slot; Messages' service_tier type relaxed to admit operator-typed
  values per the alias design's freeform contract.
- anthropicSpeed: emitted onto Anthropic Messages speed; dropped on
  non-Messages targets.
- anthropicBeta: translate cannot move it to the request header (the
  translate signature has no headers), so it is left as body residue
  and the gateway-side rule-apply pass owns header materialization in
  the next task; a mergeAnthropicBetaTokens helper lives in
  via-messages/ for that consumer.

Drop-side emission stays the per-upstream sanitizer's job; translate
emits only the non-drop cells of the table.

The shared reasoning_effort union (gemini-via/gemini.ts) extends to the
seven values the alias suggestion list publishes (none|minimal|low|
medium|high|xhigh|max) and stops collapsing minimal onto low.
One assertion per non-drop cell of the model-aliases translate-emission
table: each test sets a single inbound rule (native or extension) and
checks the upstream-natural slot is present with the value forwarded
verbatim. Each pair also gets a drop-side assertion that the residue
field does not leak into the translated body — the per-upstream
sanitizer is the actual stripper, but translate must not invent a
target field where the mapping table says drop.

Pre-existing responses-via-messages tests that paired effort with
reasoning.summary keep their summary input (so the disabled-precedence
behavior is still verified) but no longer assume summary is silently
discarded; the new contract surfaces it as thinking.display where the
upstream has a slot, and the disabled case continues to win.
enumerateModelInterpretations now matches each (provider, lookupId) pair
against the global alias table (post-prefix-strip, semantic P). Per the
matched alias's onConflict, the fan-out pushes either the alias-rewrite
interpretation, the real-name interpretation, or both (in either order).
A post-resolution prune drops the alias-rewrite when the real-name
resolved under onConflict=real-only — the alias remains when the real
lookup misses, so an empty upstream catalog falls back to the alias's
target id.

The aliasRules and aliasName ride through into a new ChatCandidate
wrapper type so downstream attempt logic can apply the rules and set
the x-floway-alias response header without polluting the
@floway-dev/provider package. RoutingDecision and classifyResponsesItemAffinity
become generic over the candidate type to carry alias metadata across
the affinity walk without re-deriving it.

modelAliases is added to the central Repo interface so each chat
serve.ts call site reaches it through getRepo() — the same pattern
the other operator-managed config tables follow.
…x-floway-alias

applyAliasRulesTo<InboundProtocol> writes rule values into each inbound
IR's native slot when the protocol supports the concept and the Floway
extension slot otherwise. Alias values override user-supplied values per
the operator-locked semantics in Goal 3 of the design.

/v1/models appends alias entries with aliasedFrom carrying the target,
upstream filter, rules, and conflict mode. Aliases with
visibleInModelsList=false are omitted; aliases whose targets are
unreachable are still listed — operator-declared, no silent hide. The
Gemini /v1beta/models surface mirrors the same alias-listing policy.

The x-floway-alias response header carries the matched alias name on
every call served via an alias, giving callers a no-mode-required debug
hook for understanding routing.

Per-upstream sanitizers run just before each upstream HTTP call,
emitting one drop-trace line per stripped extension field with the
matched alias name attached. The same sanitize emission point fires for
client-sent extension residue regardless of alias provenance.

Embeddings, images, and /v1/completions thread aliases through
resolveModelForRequest so alias-name resolution still rewrites the
target id; rules don't apply to these passthrough endpoints (no protocol
slots) but the matched alias name still rides out on the response
header, and one drop trace line per declared rule lands so an operator
can confirm the rewrite ran.

Side touches:
- ChatCandidate replaces ProviderCandidate on every chat attempt arg
  type, restoring the alias-metadata propagation the routing layer
  already preserves.
- GatewayCtx grows a per-request responseHeaders bag; the http wrappers
  flush it onto the outgoing Response through a new
  finalizeGatewayResponse helper that also routes through the dump
  accumulator.
- ProviderModelResolution gains an optional aliasName; passthrough
  callers read it directly off the resolved match.
- pushInterpretation's onConflict switch grows an assertNever default.
…safe, idempotent seed, ordered listing)

Final-review fix wave on top of the model-aliases data-plane series. Each
finding from the whole-branch review is addressed; one shim is kept and
documented per the reviewer's option-B recommendation.

- Critical #1: `/v1/embeddings`, `/v1/images/*`, and `/v1/completions`
  returned the response through the legacy `ctx.dump?.finalize` pattern
  instead of `finalizeGatewayResponse`, so the `x-floway-alias` header
  the passthrough scaffold stamped on the per-ctx bag was silently
  dropped. Route all three call sites through `finalizeGatewayResponse`
  for a uniform finalize seam.

- Important #4: Make the `x-floway-alias` stamp streaming-safe by
  introducing `stageGatewayResponseHeader(ctx, name, value)` that writes
  the header to BOTH Hono's `c.header` (the documented knob that
  survives `streamSSE`'s internal `c.newResponse`) and the per-ctx
  `responseHeaders` bag `finalizeGatewayResponse` merges onto Web-
  `Response.json`-built non-streaming responses. The chat serve.ts
  layers (messages, gemini, responses, chat-completions) and
  passthrough-serve all go through this helper, eliminating the
  reliance on post-construction `response.headers.set` for streaming.

- Important #3: Add coverage in `gemini_test.ts` that a visible alias
  appears in `/v1beta/models` as a synthetic Gemini model entry with
  the expected name, displayName, and supportedGenerationMethods. The
  prior code path was untested; a refactor of `loadGeminiModels` would
  not have been caught.

- Important #2: Keep the pre-alias-table `rewriteResponsesEntryModelAlias`
  shim that swaps `codex-auto-review` -> `gpt-5.4` before the matcher
  runs (option B from the review). Add a code comment above it
  explaining the carveout: the seeded alias is `on_conflict='real-only'`
  and on a Codex upstream that exposes a real `codex-auto-review` model
  the alias would otherwise lose, breaking parity with Codex CLI's
  native behavior. The shim is temporary pending a deliberate Codex
  behavior change.

- Minor #6: Switch the `0046_model_aliases.sql` seed `INSERT` to
  `INSERT OR IGNORE` so a fresh local-dev replay doesn't trip the
  PRIMARY KEY uniqueness check.

- Minor #8: Add `ORDER BY alias` to `loadAllAliases` so the `/v1/models`
  listing emits alias entries deterministically across runtimes.

The unit-test fan-out reflects adding `c: AuthedContext` to `GatewayCtx`
so the serve layer can call Hono's `c.header` directly. Test stubs go
through the shared `stubAuthedContext` helper.
…*-via-messages

Task 3 (`e1891e1d`) added synthesis of `thinking.display` from Responses-native
`reasoning.summary` and Gemini-native `thinkingConfig.includeThoughts`, plus a
new native-to-native `service_tier` carry on Responses → Messages. These are
NATIVE fields with translation behavior the prior pairs had already decided;
the alias work should not have reshaped that contract.

Revert the native-field paths in:

- responses-via-messages: drop `reasoning.summary` → `thinking.display` and
  `service_tier` → `service_tier` propagation. Keep the new extension-field
  carries (`thinking_budget`, `adaptive_thinking`, `anthropic_speed`).
- gemini-via-messages: drop `thinkingConfig.includeThoughts` →
  `thinking.display` propagation. Keep `generationConfig.serviceTier`,
  `verbosity`, and top-level `anthropicSpeed` extension carries.

Tests that asserted the new native-field synthesis are removed; the existing
extension-field tests stay untouched.
…eam + form

Two follow-up changes to the alias data-plane:

1. Remove the `/v1/responses` entry-level `codex-auto-review → gpt-5.4`
   rewrite shim. The seed alias in `0046_model_aliases.sql` now routes
   `codex-auto-review` everywhere through the normal matcher. On a Codex
   upstream that exposes a real `codex-auto-review`, `on_conflict=real-only`
   lets the real id win — Codex CLI callers wanting the previous shim
   behaviour must set `effort=low` themselves or pick a different
   `onConflict`. All other inbound surfaces are unchanged.

2. List aliases per-upstream and per-addressable-form in `/v1/models` and the
   Gemini `/v1beta/models` listing, instead of one synthetic entry per alias.
   Each visible alias now emits one entry per (provider, listed form) pair
   whose raw catalog can resolve the target, so dual-listed upstreams emit
   both `codex-auto-review` and `<prefix>/codex-auto-review`. Aliases whose
   target is not reachable from any upstream produce zero entries; the
   previous "no silent hide" rule no longer fits a per-upstream model.

   A new `display_name` column on `model_aliases` (migration `0047`) carries
   an operator-set label; the listing composes it as `${upstream}: ${alias
   displayName}` when set, or `${upstream}: ${target displayName}${rules
   summary}` otherwise. The rules-summary formatter and display-name
   composer live in `control-plane/model-aliases/display.ts` and are
   covered by unit tests.

   The shared per-upstream alias emission helper sits in
   `data-plane/models/alias-listing.ts` and is reused by both the OpenAI and
   Gemini listings. `getModelsForListing` exposes the per-upstream raw
   catalog alongside the merged public model list so we collect catalogs
   once per request even when many aliases need them.
…remaining pairs

Task 3 (`e1891e1d`) also reshaped NATIVE-field translation on the
remaining three pairs the first revert wave (`17a7877c`) did not cover.
The alias work should only have added emission of the new Floway
extension fields; native-to-native handling on these pairs had been
decided in the prior contract and is restored here.

Revert the native-field paths in:

- gemini-via-responses: restore the pre-Task-3 `reasoning` block shape
  where `includeThoughts: true` paired with a non-`none` effort produces
  `summary: 'detailed'`; drop the `false → 'omitted'` synthesis Task 3
  added. Keep `verbosity` and `serviceTier` extension carries
  (Floway-only fields on Gemini IR).
- messages-via-responses: drop `thinking.display` → `reasoning.summary`
  synthesis and the `service_tier` → `service_tier` native-to-native
  propagation. Keep the `verbosity` extension carry under `text`. The
  unused `mapAnthropicDisplayToSummary` helper is deleted.
- messages-via-chat-completions: drop the `service_tier` →
  `service_tier` native-to-native propagation. Keep the `verbosity`
  extension carry.

Tests that asserted the new native-field behavior are removed; the
extension-field tests stay untouched.
…ayName

The alias-local display name (operator-set displayName, or synthesized
target + rules summary) is independent of which addressable form the
entry surfaces under. The upstream-label prefix (`${upstream.name}: `)
belongs at the caller, mirroring the real-model path in
`registry.ts` where the synthesized prefix is added only on the
`prefixed` listing form.

Result: a bare alias listing (`codex-auto-review` on a no-prefix or
unprefixed-listed upstream) reads as `"Codex Auto Review"` or
`"GPT-5.4 (low effort)"` without an upstream label, matching how a
bare real model renders. The prefixed form (`azure/codex-auto-review`)
keeps the `"Azure: Codex Auto Review"` shape unchanged.
…orListing

The three listing endpoints (/v1/models data plane, /api/models control
plane, /v1beta/models Gemini) each independently looped over aliases and
re-built the per-emission entry. Move the fan-out to a single
synthesizeListedAliases() called once inside getModelsForListing(); the
function returns ListedModel[] (ResolvedModel + optional aliasedFrom)
that every surface mapper consumes uniformly.

Side effect: the control-plane /api/models was previously alias-blind,
because the dashboard hit getModels() instead of the listing function.
Now it goes through the shared path and the dashboard Models page
surfaces alias rows with their aliasedFrom provenance.
Two no-prefix upstreams both serving the alias target produced two
identical `codex-auto-review` rows in /v1/models and /api/models —
visible in the dashboard Models list as duplicate cards.

mergeIntoCatalog dedupes real models the same way; alias entries now go
through the equivalent union (endpoints OR-ed, kind re-derived, provider
bindings concatenated) so a single alias surfaces as one row whose
`upstreams` field carries every backing binding.
Each rule field on an alias entry's aliasedFrom now appears as its own
badge appended after the existing context/prompt/output badges, so the
seed codex-auto-review shows "low effort" next to its upstream pills.

Per-field labels move into a shared formatAliasRuleBadges helper in
@floway-dev/protocols/common; the gateway's formatAliasRulesSummary
derives from it (same wording, joined with commas, wrapped in parens
when used as the synthesized display-name suffix). Dashboard and
gateway therefore stay in lockstep on rule labels without parallel
formatters drifting.
Each alias entry's row now leads with `alias of: <target>` followed by
one per-rule badge in `label: value` form (or label-only for boolean
toggles like `adaptive reasoning`). Outline border, no fill, low
contrast — distinct from the highlighted upstream pills, lighter than
the filled context/prompt/output limits.

The shared helper returns rich items so each surface can format as it
likes. The gateway's parenthesized display-name suffix keeps its
compact `value label` form independently.
Comments must not reference in-progress design docs that live under
docs/superpowers/ (gitignored). Stripping the "See docs/..." tails from
JSDocs on the protocol-extension fields and the apply.ts header — the
preceding sentences already document the translation contract.
Both Responses and Messages carry native service_tier; the translator
silently dropped it, so an alias serviceTier rule landing on a Responses
inbound that routed to a Messages upstream vanished. Spread it onto the
target alongside the other native fields.
# Conflicts:
#	packages/translate/src/chat-completions-via-messages/request.ts
#	packages/translate/src/messages-via-chat-completions/request.ts
#	packages/translate/src/messages-via-chat-completions/request_test.ts
#	packages/translate/src/messages-via-responses/request_test.ts
#	packages/translate/src/responses-via-messages/request.ts
#	packages/translate/src/responses-via-messages/request_test.ts
matchAlias returns the alias directly; the sole caller (pushInterpretation
in registry.ts) was already destructuring the wrapper away. Both the
review and cleanup passes converged on this — remove the indirection.
formatAliasRulesSummary was only consumed by composeAliasDisplayName in
the same file; the standalone export existed so the test could import it
directly (anti-test-bending). aliasPublicId was a 2-line ternary used
exactly once inside aliasEmissionToListedModel. Both now live at their
call site; tests target the surviving public entry.
…nitize helper

The passthrough serve was re-emitting the floway.alias.drop log shape
that chat/shared/sanitize.ts already owns, and re-finding the matched
alias by name to walk its rules. ModelAliasRules now rides through
resolveModelForRequest alongside aliasName, so passthrough has the
rules in hand; the rules walker moves into sanitize.ts as
traceAllRulesDropped and reuses createSanitizeTraceCtx so both
surfaces emit identical trace lines.
…arget

mapSummaryToAnthropicDisplay('auto') returns undefined, so the apply
step has always left a user-supplied thinking.display untouched in
that case. The comment now spells out that this is intentional —
'auto' means "defer to upstream default", and operator-locked
overwrite applies to every other summary value.
The cross-protocol service_tier↔speed:'fast' bridge that #114 added to the
*-via-messages and messages-via-* translators makes the alias-extension
knob anthropicSpeed redundant — operators who want speed: 'fast' on a
Messages upstream can set serviceTier: 'fast' on the alias and the bridge
handles the wire-level conversion in both directions.

Removed before any external client relies on it (the alias schema is not
yet public — PR is still open):

- ModelAliasRules.anthropicSpeed plus the matching PublicModelAliasedFrom
  field on /v1/models.
- The anthropic_speed Chat / Responses extension fields, the top-level
  anthropicSpeed Gemini field, and their entries in
  FLOWAY_EXTENSION_FIELDS.
- The four applyAliasRules* branches that wrote the knob into each
  inbound IR's natural slot, plus the matching emit branches in
  chat-completions-via-messages, responses-via-messages, and
  gemini-via-messages translators.
- The trace-helper and display/badge formatters that surfaced the field.
- All tests asserting either side of the now-removed contract.

anthropicBeta is unrelated (Anthropic beta header tokens) and is kept
intact. The native Messages `speed` field is also untouched — callers
hitting the Messages inbound directly still control it.
# Conflicts:
#	packages/gateway/src/data-plane/models/load.ts
#	packages/protocols/src/common/models.ts
Mirror the proxies repo's CRUD shape: `create` rejects PK collisions with
a typed `{ reason: 'duplicate' }` so the route layer can map to 409 without
driver-specific error parsing, `save` upserts in place (preserving the
existing row's createdAt on conflict), `delete` returns whether the row
was removed. `getByAlias` is the targeted lookup the PATCH handler uses to
merge a partial body against the persisted row.

In-memory impl now sorts loadAll by alias to match the SQL `ORDER BY alias`
contract; the Map keyed by alias keeps PK semantics 1:1 with SQLite.
Operator-managed alias rows previously had no admin surface — the only
write path was a hand-edited migration. Wire admin-only CRUD next to the
existing model-aliases code:

  GET    /api/aliases             list, sorted by alias
  POST   /api/aliases             create; 409 on PK conflict
  PATCH  /api/aliases/:alias      partial update; 404 when missing
  DELETE /api/aliases/:alias      idempotent-shaped 204/404

The Zod schemas mirror the closed rule knob set (reasoning effort /
budgetTokens / adaptive / summary, verbosity, serviceTier,
anthropicBeta[]) under `.strict()` so an unknown rule key is a 400 — but
each value stays freeform: a newly-introduced upstream-side enum ships
through without a gateway code change (Goal 2). Alias names are bounded
by the same `[A-Za-z0-9_.:-/]+` grammar the real model ids already use.

PATCH propagates the absent/null distinction for `displayName` so the
operator can clear an operator-set label back to the synthesized
fallback without dropping into a separate "reset" route.
Adds the dashboard surface for the new /api/aliases CRUD endpoints:

  - useModelAliases composable mirrors the proxies-store pattern (module-
    scoped cache, error / loading refs, load()).
  - AliasesSettingsCard slots into the Settings page directly under
    ProxiesSettingsCard, sharing the glass-card styling and animate-in
    delay ordering.
  - AliasRow surfaces the alias id, optional display name, target model,
    rule badges (sourced from formatAliasRuleBadges so the badge order
    matches the rest of the dashboard), and an `on_conflict` chip.
  - AliasEditDialog is a single modal for both create and edit. Reasoning
    is rendered as a None / Effort / Budget / Adaptive radio + a separate
    summary input so the mutually-exclusive wire shape is visible at a
    glance. Suggestion hints come from the target model's chat.reasoning
    metadata when it matches a real catalog entry, but every value field
    stays freeform — Goal 2.

Co-located component-level smoke tests use @vue/test-utils (newly added
as a devDep) plus happy-dom. The dialog tests stub the api client, the
two stores, and reka-ui's portaling Dialog so the form mounts inline
where assertions can reach the inputs and read the posted JSON.

The gateway package's exports map gains two new type-only subpaths
(`./control-plane/model-aliases/serialize`, `./control-plane/model-aliases/types`)
so apps/web can pull `SerializedModelAlias` and `ModelAliasRules` as the
source-of-truth types without crossing the existing deep-import ban.
routes.ts had a re-export of ModelAliasRules that no file imported; the
frontend pulls the type from packages/gateway/src/control-plane/model-aliases/types.ts
directly. The PATCH/DELETE param fallbacks (?? '') were dead — Hono only
dispatches the :alias routes when the segment is present, matching how
api-keys/users routes use param('id')!. repo.ts trimmed verbose
explanatory prose down to the one load-bearing fact. repo/types.ts
'save used by import/restore flows' was stale — only PATCH calls it.
Guard the UPDATE with `AND display_name IS NULL` so a re-run against an
environment where the operator already renamed the seed doesn't wipe
their value. Migrations are tracked one-shot but defense in depth keeps
the local-dev replay path safe.
applyAliasRulesToGemini was writing payload.anthropicBeta but nothing
read it — gemini-via-messages doesn't reference the field, and the
Messages attempt reads candidate.aliasRules.anthropicBeta directly for
the outbound anthropic-beta header. The sanitizer would strip the body
field on its way to upstream regardless. Removed the write and the
matching test; the header path is unchanged.

Also corrected the Messages apply doc that claimed "the write-side
validator forbids" adaptive + budgetTokens — the schema accepts both
today; the dashboard's tagged radio is what enforces exclusivity, and
the apply step picks adaptive when both arrive raw.
Menci added 30 commits June 28, 2026 01:47
`shared/candidates.ts` reduced to a 12-line wrapper that re-exported
`ProviderCandidate` from `@floway-dev/provider` and a `ChatCandidate =
ProviderCandidate` alias. The rename added no semantic distinction:
production chat code already wrote `ChatCandidate` while tests and
`@floway-dev/test-utils` already wrote `ProviderCandidate`, leaving the
data plane referring to the same shape by two names.

Replace every `ChatCandidate` in type position with `ProviderCandidate`,
fold the import into each file's existing `@floway-dev/provider` import
line (so `import/no-duplicates` stays satisfied), and delete the wrapper.
One canonical name across chat code, tests, and shared helpers.
`ChatMetadataEditor.vue` and `ModelEditor.vue` carried the exact same
`parseOptionalNumber` helper — blank/null/negative collapse to
`undefined`, every other value passes through `Number(raw)`. Same
contract on both sides because both editors feed nonnegative integer
counts the backend validates identically. Lift into
`apps/web/src/utils/parse-optional-number.ts` so the rule has one
source of truth.
…s share the providers list

`enumerateAddressableModelIds` called `getModels()` and then immediately
called `listModelProviders(upstreamFilter)` again — `getModels` already
listed providers internally, so the upstreams.list() round-trip and
provider-instantiation cost paid twice. Lift the catalog assembly into
`getModelsFromProviders(providers, ...)` and let the addressable engine
thread the same list into both halves of its walk. `getModels` keeps
its old signature as a thin wrapper.
The same five-step ritual ran at the head of every chat serve: catch
AliasNoTargetAvailableError → render protocol failure; pull
candidates/sawModel/failedUpstreams/aliasResolution off the result;
when an alias matched, mutate the payload (or local model var), apply
the protocol's chat-rule overlay, and stage the response header. Five
serve sites (chat-completions, messages × 2, gemini × 2, responses)
each carried the same prose.

Extract into chat/shared/alias-prelude.ts. The protocol's mutation
plus the rule overlay live in an `applyAlias` callback; the no-target
failure renderer is supplied per protocol. The header staging and the
404 conversion stay in the helper so all surfaces agree on the
contract — `x-floway-alias` is set for every protocol the moment an
alias matches, and `AliasNoTargetAvailableError` always converts to
the protocol's `alias-no-target-available` failure shape.
…ocols/common

`apps/web/src/api/types.ts` hand-rolled `PublicModel`, `ChatModelInfo`,
`ModelLimits`, and `ModelEndpointInfo` alongside the canonical
definitions it already imported from `@floway-dev/protocols/common`.
The local `PublicModel` made required fields optional and embedded a
different `endpoints` shape (`Record<string, { url; doc? }>` vs the
canonical `ModelEndpoints` presence map) — silent drift waiting to
happen. Drop the local copies and re-export the canonical types.

`announced-metadata.ts` switches to `PublicModelLimits`. Four test
fixtures move to a shared `api/test-fixtures.ts` that supplies the
required-field defaults `ControlPlaneModel` now demands.
…ged>

The wire-input shape was hand-rolled field-by-field, so the next column
added to `ModelAlias` required two edits. Derive it directly from
`ModelAlias` and strip the server-managed columns (`sort_order` is
defaulted by `nextSortOrder`; `created_at` / `updated_at` are stamped
by the repo).
`type AliasRules = ChatAliasRules | Record<string, never>` was a union
without a runtime discriminator — every consumer narrowed via an
unsafe `as ChatAliasRules` cast (a dozen sites across the gateway and
the SPA). The empty-object arm is already satisfied by
`ChatAliasRules` because every field is optional, so the union added
no real safety. Collapse to `type AliasRules = ChatAliasRules` and
read fields directly.

Also clears the Messages service-tier sibling field (`speed` vs
`service_tier`) on every overlay branch so the upstream never sees
both with conflicting values.
The constant is transport-level — consumed by the chat alias-prelude
and the passthrough seam — not rule-overlay-level. Splitting it off
means transport call sites no longer drag in the per-protocol overlay
helpers from `apply.ts` to read one string.
The Switch could only emit true/undefined, so an existing record with
`adaptive: false` (gateway forwards it to upstream verbatim, the badge
formatter renders "non-adaptive") would silently round-trip to
`undefined` on first edit. Replace with a Select offering auto /
on / off so every state the schema admits stays expressible.
Two divergences in the catalog's alias-window computation:

1. The codex code re-derived `routableIds` via a plain `addressableSet.has(...)`
   check, which is laxer than `synthesizeListedAliases`'s
   `kind === alias.kind` predicate. A multi-kind alias misconfigured
   with off-kind targets would have those targets contribute to the
   min-window even though the resolver could never pick them.

2. Plain (non-alias) slug lookups went through `slugContextWindow`, a
   parallel map of the same data already in `addressableById`. Drop
   the duplicate and read off the same map the alias branch uses.

Also realigns the top-of-file doc with the round-1 fix that moved the
fallback from `firstRoutable` to `min over routable targets`.
…relude with passthrough

`resolveCandidatesAndApplyAlias` used to return the rendered failure
before staging the response header, so the alias-no-target 404 lost
the `x-floway-alias` correlation an observability tool would tie the
client request to. Stage the header inside the catch before rendering
the failure — `finalizeGatewayResponse` copies ctx.responseHeaders
onto every outbound response, including rendered failures.

Same refactor lifts the prelude out of `chat/shared/` into
`model-aliases/prelude.ts` and lets `passthrough-serve.ts` import it.
The prelude is now generic over the per-protocol target descriptor
(chat returns `ChatTargetApi`, passthrough returns `ModelEndpointKey`)
and accepts an optional `applyAlias` callback (passthrough leaves it
undefined because its body rewrite happens at the provider boundary,
not on the inbound payload). The header-staging + 404-conversion dance
now lives in one place for both protocol families.

`aliasFailureFromError` falls away — the prelude constructs the
failure inline because every consumer that ever called it has moved
to the new helper.
…atch

The resolver dropped targets for two distinct reasons — no enabled
upstream binding at all, OR a binding exists but none satisfies the
inbound endpoint predicate (kind/endpoint mismatch). The error
message collapsed both into the same "no enabled upstream binding"
wording, so a chat client hitting an embedding-only alias saw a hint
pointing at "no binding" when the real cause was the endpoint.

`candidateRoutability` now reports the rejection reason and
`AliasNoTargetAvailableError` flips to "none currently serves the
inbound endpoint" when every dropped target lost on endpoint-match.
The frontend's hand-written `computeAnnouncedMetadata` mirrors the
gateway's `intersectChat` / `intersectLimits`. Round 4 already caught
one drift (the `||→&&` modalities fix landed gateway-side first); a
test covering the same matrix gives the dashboard's read-only preview
+ edit-dialog seed buffer a CI gate against the next silent drift.
The four chat protocols (chat-completions, messages, gemini,
responses) each catch `AliasNoTargetAvailableError` and render the
protocol-specific 404 envelope; the `x-floway-alias` response header
is staged on that path so observability can tie "client asked for X"
to the rendered 404. None of the four serve_test files exercised the
path end to end — the `aliasResolutionQueue` type carried `| Error`
but no test ever pushed one. Add one test per protocol so the
404-envelope wiring and the header staging stay locked in.
The component lives in `@floway-dev/ui` and serves any consumer that
wants free-form input + a suggestion list. The JSDoc preamble called
out alias rule fields specifically and the `borderless` doc pinned
"the alias-target row" as the use case; trim both to generic
language so the primitive reads as one a future caller can adopt.
apply.ts's `fast`/non-fast branches each delete the sibling field
(`body.speed` vs `body.service_tier`) so the upstream never sees both
with conflicting values. The existing tests start from a payload that
had neither field set, so a future regression deleting the `delete`
call would not be caught. Seed each branch's prior-state sibling and
assert it's cleared after the overlay.
…header

The shared prelude lifts the chat protocols' alias-no-target handling
into a generic helper passthrough now reuses. Every chat protocol's
serve_test covers the 404 + header path end-to-end; passthrough went
through the same code path with no test of its own. Add one that
seeds an alias whose target id is not in any upstream catalog,
exercises /v1/embeddings, asserts the 404 envelope + `x-floway-alias`
header staging, and verifies the upstream is never called on the
failure path.
…eriving

The shared prelude already knows both the inbound `modelName` and the
optional `aliasResolution`, so the "id to use downstream" is a closed
computation it can perform once. Body-based protocols
(chat-completions/messages/responses) read it implicitly through their
own `payload.model = aliasResolution.targetModelId` mutation inside
`applyAlias`; path-based Gemini had to re-derive
`aliasResolution?.targetModelId ?? args.model` outside the prelude.

Lift the derivation into the prelude as `effectiveModelId`. Gemini
reads it directly — the `?.` chain at the call site disappears, the
asymmetry the auditor flagged between protocols closes, and the next
path-based caller gets the field for free.
…min-only)

Admin's editor surfaces (alias edit, upstream edit) configure
gateway-wide state, not the admin's per-account data-plane view. The
default scoped behavior — mirroring the data plane's effective
upstream cap — is correct for the Models page and Playground (admin
can self-restrict and watch the playground respect it), but wrong for
the editor dialogs, which need to see "what exists on the entire
gateway" so admin can wire an alias to a target on an upstream the
admin's own account is currently restricted out of.

Add `gateway_wide=true` to /api/models. Server requires admin and
passes `null` to `enumerateAddressableModelIds` (= all upstreams).
Non-admin sessions get 403 — the bypass would leak models from
upstreams they have no data-plane access to.
The alias editor surfaces (target combobox, shadow detection,
kind-mismatch warning, no-target-available warning) configure gateway
state, not the admin's per-account data-plane view. A self-restricted
admin opening AliasEditDialog used to see a combobox missing every
model on upstreams the admin had restricted out — they could not wire
an alias to a target the gateway can actually serve. Pass
`gateway_wide=true` so the editor sees "what exists" rather than
"what this account can reach". The default `useModelsStore` (Models
page + Keys page) stays scoped because those surfaces are meant to
mirror data-plane visibility.
…mantics

The store feeding this check now reads `gateway_wide=true`, so the
addressable surface it sees is the entire gateway. "No target
currently resolves under your upstream access" was misleading — the
check is no longer scoped to the admin's account. Rewrite to "No
target resolves to any model on this gateway." which describes the
actual cause: a configured target id that no upstream serves.
…teway-wide implicitly

Drop the `gateway_wide=true` query param patch. The server now decides
gateway-wide vs scoped by the caller's role: admin sessions always
receive the full catalog (Models playground + alias edit + settings
all share one fetch), non-admin sessions keep their effective-upstream
cap. The dashboard then filters client-side per surface:

- Alias edit / Settings card / AliasesSettingsCard: no filter, the
  gateway-wide catalog IS the editor's view of the world.
- Models playground: filter by the effective cap of (selectedKey's
  upstream_ids, admin's own user.upstreamIds). Switching the key in
  the playground re-narrows the visible models live. Aliases collapse
  out of the list when no configured target is reachable under the
  cap.

`ModelInfoBar` now takes optional `catalog` + `cap` props and renders
"X / N targets reachable" on alias rows in the playground — showing
exactly how the resolver would narrow this alias's pool under the
chosen key.

A new `apps/web/src/utils/reachability.ts` carries the pure helper
(`isReachableUnderCap`, `reachableTargets`, `effectiveUpstreamCap`)
with a dedicated unit test pinning the cap semantics, alias
reachability through targets, and the addressable-but-not-listed case.
…rget check

The store query was simplified — `gateway_wide=true` was removed in
favor of admin-implicit gateway-wide behavior — but this comment
still cited the old URL shape. Update the comment to describe the
actual server contract.
The optional alias field on TraceLine was never set by any call site —
SanitizeTraceCtx.emit only ever produces { field, targetProtocol }, so
the slot was unreachable.
The remaining assertEquals lines speak for themselves; the inline note
about `endpoints` being surfaced elsewhere was a leftover from an
earlier patching pass.
…etadata

PublicModel.limits is a required field; the gateway's mirror at
data-plane/models/alias-listing.ts passes real.limits directly. The
?? {} on the frontend masked what would otherwise be a type-impossible
state, papering over a backend protocol violation instead of surfacing it.
…ty (caller-scoped)

The synthesizer used to project both the alias's metadata (limits,
chat, endpoints, cost) AND its `aliasedFrom.targets` against the
caller's addressable surface, so a non-admin / data-plane caller saw:
- (a) a different limit/endpoint set than the admin saw, computed from
  whichever subset of targets sat inside their cap — the same alias
  looked different depending on who asked, and a tighter cap could
  paradoxically widen the published window
- (b) the operator's full configured target list — including target
  ids on upstreams they had no data-plane access to AND typo'd /
  removed model ids — which is operator state, not their business

Split the two axes:

- Metadata (limits, chat, endpoints, cost) is now computed against
  the GATEWAY-WIDE addressable surface. Every caller — admin session,
  non-admin session, data-plane api key — reads the same numbers for
  the same alias. Safe-lower-bound holds across the entire gateway,
  not the caller's subset.
- `aliasedFrom.targets` is per-caller. When `narrowTargets=true`
  (every data-plane call + non-admin control-plane call) only targets
  the caller can actually reach appear. When `narrowTargets=false`
  (admin control-plane only) the raw configured list survives so the
  alias-edit dialog can render typos and out-of-cap targets for
  fixing.
- Alias visibility (whether the row appears in the response at all)
  stays caller-scoped: at least one target must be reachable under
  the caller's cap.

Threaded through `/v1/models`, `/v1beta/models`, codex 1p catalog,
and `/api/models`. Non-admin paths now fetch BOTH the caller-scoped
and the gateway-wide addressable surface; admin paths skip the
second call because they already are gateway-wide. The SWR cache
shares the per-upstream catalog fetches.

Tests cover the split (synthesizer-level + control-plane integration:
admin sees raw config with typos, non-admin sees narrowed projection,
both read identical limits for an alias whose targets disagree on
limits across upstreams).
`alias of: 3 / 4 models` when some configured targets are out of the
current cap; `alias of: 5 models` when every target is reachable;
`alias of: <id-or-display-name>` when only one target is reachable
(operator can see exactly what the resolver will pick), and the
`selection: <mode>` badge is hidden in that case because there is no
selection to make.

Replaces the previous "alias of: gpt-5.4, gemini-3-flash-preview,
deepseek-v4-pro +1 more" list, which contradicted the parallel
"3 / 4 reachable" badge when some of those listed ids were actually
out of cap. The list shape also became unwieldy on aliases that fan
out across several targets — the count is the information operators
actually act on.
… display name

The badge mirrors the value the operator typed into the alias's target
field and the value a client would put on the wire. Display name is
already on the picked target's own row in the sidebar — repeating it
here is noise; the id is the actionable identifier.
Alias rows used to show no provider badges (the wire's `upstreams: []`
on alias entries leaves nothing to render). Compute the de-duped
union across the caller-reachable targets' bindings so the alias info
bar surfaces the same provider-badge shape every real-model row does.

Each binding is further filtered against the caller's effective cap:
a target may sit on three upstreams of which only one is in cap, and
only the in-cap one is the provider the resolver would actually route
to. This keeps the badges aligned with the parallel "alias of: K / N
models" badge — both tell the operator what the resolver will see
under the current key, not what's configured gateway-wide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant