feat(aliases): multi-target routing alias with per-target rules#113
Open
Menci wants to merge 174 commits into
Open
feat(aliases): multi-target routing alias with per-target rules#113Menci wants to merge 174 commits into
Menci wants to merge 174 commits into
Conversation
Introduces the storage layer for the model-aliases data-plane feature. The table is global, primary-keyed by alias name. Conflict resolution is encoded as a CHECK-constrained TEXT column, freeform rule values are stored as JSON, and the codex-auto-review seed entry lands with the table. loadAllAliases reads the full table per request (the table is operator-managed and small; a cache layer is unnecessary for v0).
Each inbound protocol IR gains the closed set of mode-knob fields it cannot natively express (thinking_budget, adaptive_thinking, reasoning_summary on chat-completions; thinking_budget, adaptive_thinking on responses; verbosity on messages; verbosity, serviceTier inside generationConfig on gemini; anthropic_speed/anthropicSpeed and anthropic_beta/anthropicBeta everywhere they apply). The extensions are public — a client can set them directly and they behave identically to alias-injected rules. The per-upstream sanitizer strips any extension residue before the upstream call and emits one log line per drop when given a trace context, so cross-protocol drops are observable without leaking the field to upstream.
Each translate pair now reads the inbound IR's native and Floway-extension
mode-knob fields and writes them to the upstream protocol's natural slot
per the model-aliases design table. Routing is purely by upstream wire
protocol; translate never branches on model version.
Coverage per rule:
- reasoning.effort: emitted onto OpenAI Chat reasoning_effort, Responses
reasoning.effort, Anthropic output_config.effort, Gemini
thinkingConfig.thinkingLevel (the inverse mappers stay where they were).
- reasoning.budgetTokens / reasoning.adaptive: emitted onto Anthropic
thinking.{type:'enabled', budget_tokens} and thinking.{type:'adaptive'}
via a shared via-messages helper; Gemini path keeps its native
thinkingBudget handling.
- reasoning.summary: bidirectional Responses reasoning.summary ↔ Anthropic
thinking.display mapping with concise|detailed → summarized, omitted →
omitted, auto → upstream default; reverse picks concise as the
Responses-side canonical form.
- verbosity: native fields on Chat and Responses (added now — the IR
did not carry them yet), Floway extension on Messages and Gemini.
- serviceTier: passes through verbatim onto each protocol's service_tier
slot; Messages' service_tier type relaxed to admit operator-typed
values per the alias design's freeform contract.
- anthropicSpeed: emitted onto Anthropic Messages speed; dropped on
non-Messages targets.
- anthropicBeta: translate cannot move it to the request header (the
translate signature has no headers), so it is left as body residue
and the gateway-side rule-apply pass owns header materialization in
the next task; a mergeAnthropicBetaTokens helper lives in
via-messages/ for that consumer.
Drop-side emission stays the per-upstream sanitizer's job; translate
emits only the non-drop cells of the table.
The shared reasoning_effort union (gemini-via/gemini.ts) extends to the
seven values the alias suggestion list publishes (none|minimal|low|
medium|high|xhigh|max) and stops collapsing minimal onto low.
One assertion per non-drop cell of the model-aliases translate-emission table: each test sets a single inbound rule (native or extension) and checks the upstream-natural slot is present with the value forwarded verbatim. Each pair also gets a drop-side assertion that the residue field does not leak into the translated body — the per-upstream sanitizer is the actual stripper, but translate must not invent a target field where the mapping table says drop. Pre-existing responses-via-messages tests that paired effort with reasoning.summary keep their summary input (so the disabled-precedence behavior is still verified) but no longer assume summary is silently discarded; the new contract surfaces it as thinking.display where the upstream has a slot, and the disabled case continues to win.
enumerateModelInterpretations now matches each (provider, lookupId) pair against the global alias table (post-prefix-strip, semantic P). Per the matched alias's onConflict, the fan-out pushes either the alias-rewrite interpretation, the real-name interpretation, or both (in either order). A post-resolution prune drops the alias-rewrite when the real-name resolved under onConflict=real-only — the alias remains when the real lookup misses, so an empty upstream catalog falls back to the alias's target id. The aliasRules and aliasName ride through into a new ChatCandidate wrapper type so downstream attempt logic can apply the rules and set the x-floway-alias response header without polluting the @floway-dev/provider package. RoutingDecision and classifyResponsesItemAffinity become generic over the candidate type to carry alias metadata across the affinity walk without re-deriving it. modelAliases is added to the central Repo interface so each chat serve.ts call site reaches it through getRepo() — the same pattern the other operator-managed config tables follow.
…x-floway-alias applyAliasRulesTo<InboundProtocol> writes rule values into each inbound IR's native slot when the protocol supports the concept and the Floway extension slot otherwise. Alias values override user-supplied values per the operator-locked semantics in Goal 3 of the design. /v1/models appends alias entries with aliasedFrom carrying the target, upstream filter, rules, and conflict mode. Aliases with visibleInModelsList=false are omitted; aliases whose targets are unreachable are still listed — operator-declared, no silent hide. The Gemini /v1beta/models surface mirrors the same alias-listing policy. The x-floway-alias response header carries the matched alias name on every call served via an alias, giving callers a no-mode-required debug hook for understanding routing. Per-upstream sanitizers run just before each upstream HTTP call, emitting one drop-trace line per stripped extension field with the matched alias name attached. The same sanitize emission point fires for client-sent extension residue regardless of alias provenance. Embeddings, images, and /v1/completions thread aliases through resolveModelForRequest so alias-name resolution still rewrites the target id; rules don't apply to these passthrough endpoints (no protocol slots) but the matched alias name still rides out on the response header, and one drop trace line per declared rule lands so an operator can confirm the rewrite ran. Side touches: - ChatCandidate replaces ProviderCandidate on every chat attempt arg type, restoring the alias-metadata propagation the routing layer already preserves. - GatewayCtx grows a per-request responseHeaders bag; the http wrappers flush it onto the outgoing Response through a new finalizeGatewayResponse helper that also routes through the dump accumulator. - ProviderModelResolution gains an optional aliasName; passthrough callers read it directly off the resolved match. - pushInterpretation's onConflict switch grows an assertNever default.
…safe, idempotent seed, ordered listing) Final-review fix wave on top of the model-aliases data-plane series. Each finding from the whole-branch review is addressed; one shim is kept and documented per the reviewer's option-B recommendation. - Critical #1: `/v1/embeddings`, `/v1/images/*`, and `/v1/completions` returned the response through the legacy `ctx.dump?.finalize` pattern instead of `finalizeGatewayResponse`, so the `x-floway-alias` header the passthrough scaffold stamped on the per-ctx bag was silently dropped. Route all three call sites through `finalizeGatewayResponse` for a uniform finalize seam. - Important #4: Make the `x-floway-alias` stamp streaming-safe by introducing `stageGatewayResponseHeader(ctx, name, value)` that writes the header to BOTH Hono's `c.header` (the documented knob that survives `streamSSE`'s internal `c.newResponse`) and the per-ctx `responseHeaders` bag `finalizeGatewayResponse` merges onto Web- `Response.json`-built non-streaming responses. The chat serve.ts layers (messages, gemini, responses, chat-completions) and passthrough-serve all go through this helper, eliminating the reliance on post-construction `response.headers.set` for streaming. - Important #3: Add coverage in `gemini_test.ts` that a visible alias appears in `/v1beta/models` as a synthetic Gemini model entry with the expected name, displayName, and supportedGenerationMethods. The prior code path was untested; a refactor of `loadGeminiModels` would not have been caught. - Important #2: Keep the pre-alias-table `rewriteResponsesEntryModelAlias` shim that swaps `codex-auto-review` -> `gpt-5.4` before the matcher runs (option B from the review). Add a code comment above it explaining the carveout: the seeded alias is `on_conflict='real-only'` and on a Codex upstream that exposes a real `codex-auto-review` model the alias would otherwise lose, breaking parity with Codex CLI's native behavior. The shim is temporary pending a deliberate Codex behavior change. - Minor #6: Switch the `0046_model_aliases.sql` seed `INSERT` to `INSERT OR IGNORE` so a fresh local-dev replay doesn't trip the PRIMARY KEY uniqueness check. - Minor #8: Add `ORDER BY alias` to `loadAllAliases` so the `/v1/models` listing emits alias entries deterministically across runtimes. The unit-test fan-out reflects adding `c: AuthedContext` to `GatewayCtx` so the serve layer can call Hono's `c.header` directly. Test stubs go through the shared `stubAuthedContext` helper.
…*-via-messages Task 3 (`e1891e1d`) added synthesis of `thinking.display` from Responses-native `reasoning.summary` and Gemini-native `thinkingConfig.includeThoughts`, plus a new native-to-native `service_tier` carry on Responses → Messages. These are NATIVE fields with translation behavior the prior pairs had already decided; the alias work should not have reshaped that contract. Revert the native-field paths in: - responses-via-messages: drop `reasoning.summary` → `thinking.display` and `service_tier` → `service_tier` propagation. Keep the new extension-field carries (`thinking_budget`, `adaptive_thinking`, `anthropic_speed`). - gemini-via-messages: drop `thinkingConfig.includeThoughts` → `thinking.display` propagation. Keep `generationConfig.serviceTier`, `verbosity`, and top-level `anthropicSpeed` extension carries. Tests that asserted the new native-field synthesis are removed; the existing extension-field tests stay untouched.
…eam + form
Two follow-up changes to the alias data-plane:
1. Remove the `/v1/responses` entry-level `codex-auto-review → gpt-5.4`
rewrite shim. The seed alias in `0046_model_aliases.sql` now routes
`codex-auto-review` everywhere through the normal matcher. On a Codex
upstream that exposes a real `codex-auto-review`, `on_conflict=real-only`
lets the real id win — Codex CLI callers wanting the previous shim
behaviour must set `effort=low` themselves or pick a different
`onConflict`. All other inbound surfaces are unchanged.
2. List aliases per-upstream and per-addressable-form in `/v1/models` and the
Gemini `/v1beta/models` listing, instead of one synthetic entry per alias.
Each visible alias now emits one entry per (provider, listed form) pair
whose raw catalog can resolve the target, so dual-listed upstreams emit
both `codex-auto-review` and `<prefix>/codex-auto-review`. Aliases whose
target is not reachable from any upstream produce zero entries; the
previous "no silent hide" rule no longer fits a per-upstream model.
A new `display_name` column on `model_aliases` (migration `0047`) carries
an operator-set label; the listing composes it as `${upstream}: ${alias
displayName}` when set, or `${upstream}: ${target displayName}${rules
summary}` otherwise. The rules-summary formatter and display-name
composer live in `control-plane/model-aliases/display.ts` and are
covered by unit tests.
The shared per-upstream alias emission helper sits in
`data-plane/models/alias-listing.ts` and is reused by both the OpenAI and
Gemini listings. `getModelsForListing` exposes the per-upstream raw
catalog alongside the merged public model list so we collect catalogs
once per request even when many aliases need them.
…remaining pairs Task 3 (`e1891e1d`) also reshaped NATIVE-field translation on the remaining three pairs the first revert wave (`17a7877c`) did not cover. The alias work should only have added emission of the new Floway extension fields; native-to-native handling on these pairs had been decided in the prior contract and is restored here. Revert the native-field paths in: - gemini-via-responses: restore the pre-Task-3 `reasoning` block shape where `includeThoughts: true` paired with a non-`none` effort produces `summary: 'detailed'`; drop the `false → 'omitted'` synthesis Task 3 added. Keep `verbosity` and `serviceTier` extension carries (Floway-only fields on Gemini IR). - messages-via-responses: drop `thinking.display` → `reasoning.summary` synthesis and the `service_tier` → `service_tier` native-to-native propagation. Keep the `verbosity` extension carry under `text`. The unused `mapAnthropicDisplayToSummary` helper is deleted. - messages-via-chat-completions: drop the `service_tier` → `service_tier` native-to-native propagation. Keep the `verbosity` extension carry. Tests that asserted the new native-field behavior are removed; the extension-field tests stay untouched.
…ayName
The alias-local display name (operator-set displayName, or synthesized
target + rules summary) is independent of which addressable form the
entry surfaces under. The upstream-label prefix (`${upstream.name}: `)
belongs at the caller, mirroring the real-model path in
`registry.ts` where the synthesized prefix is added only on the
`prefixed` listing form.
Result: a bare alias listing (`codex-auto-review` on a no-prefix or
unprefixed-listed upstream) reads as `"Codex Auto Review"` or
`"GPT-5.4 (low effort)"` without an upstream label, matching how a
bare real model renders. The prefixed form (`azure/codex-auto-review`)
keeps the `"Azure: Codex Auto Review"` shape unchanged.
…orListing The three listing endpoints (/v1/models data plane, /api/models control plane, /v1beta/models Gemini) each independently looped over aliases and re-built the per-emission entry. Move the fan-out to a single synthesizeListedAliases() called once inside getModelsForListing(); the function returns ListedModel[] (ResolvedModel + optional aliasedFrom) that every surface mapper consumes uniformly. Side effect: the control-plane /api/models was previously alias-blind, because the dashboard hit getModels() instead of the listing function. Now it goes through the shared path and the dashboard Models page surfaces alias rows with their aliasedFrom provenance.
Two no-prefix upstreams both serving the alias target produced two identical `codex-auto-review` rows in /v1/models and /api/models — visible in the dashboard Models list as duplicate cards. mergeIntoCatalog dedupes real models the same way; alias entries now go through the equivalent union (endpoints OR-ed, kind re-derived, provider bindings concatenated) so a single alias surfaces as one row whose `upstreams` field carries every backing binding.
Each rule field on an alias entry's aliasedFrom now appears as its own badge appended after the existing context/prompt/output badges, so the seed codex-auto-review shows "low effort" next to its upstream pills. Per-field labels move into a shared formatAliasRuleBadges helper in @floway-dev/protocols/common; the gateway's formatAliasRulesSummary derives from it (same wording, joined with commas, wrapped in parens when used as the synthesized display-name suffix). Dashboard and gateway therefore stay in lockstep on rule labels without parallel formatters drifting.
Each alias entry's row now leads with `alias of: <target>` followed by one per-rule badge in `label: value` form (or label-only for boolean toggles like `adaptive reasoning`). Outline border, no fill, low contrast — distinct from the highlighted upstream pills, lighter than the filled context/prompt/output limits. The shared helper returns rich items so each surface can format as it likes. The gateway's parenthesized display-name suffix keeps its compact `value label` form independently.
Comments must not reference in-progress design docs that live under docs/superpowers/ (gitignored). Stripping the "See docs/..." tails from JSDocs on the protocol-extension fields and the apply.ts header — the preceding sentences already document the translation contract.
Both Responses and Messages carry native service_tier; the translator silently dropped it, so an alias serviceTier rule landing on a Responses inbound that routed to a Messages upstream vanished. Spread it onto the target alongside the other native fields.
# Conflicts: # packages/translate/src/chat-completions-via-messages/request.ts # packages/translate/src/messages-via-chat-completions/request.ts # packages/translate/src/messages-via-chat-completions/request_test.ts # packages/translate/src/messages-via-responses/request_test.ts # packages/translate/src/responses-via-messages/request.ts # packages/translate/src/responses-via-messages/request_test.ts
matchAlias returns the alias directly; the sole caller (pushInterpretation in registry.ts) was already destructuring the wrapper away. Both the review and cleanup passes converged on this — remove the indirection.
formatAliasRulesSummary was only consumed by composeAliasDisplayName in the same file; the standalone export existed so the test could import it directly (anti-test-bending). aliasPublicId was a 2-line ternary used exactly once inside aliasEmissionToListedModel. Both now live at their call site; tests target the surviving public entry.
…nitize helper The passthrough serve was re-emitting the floway.alias.drop log shape that chat/shared/sanitize.ts already owns, and re-finding the matched alias by name to walk its rules. ModelAliasRules now rides through resolveModelForRequest alongside aliasName, so passthrough has the rules in hand; the rules walker moves into sanitize.ts as traceAllRulesDropped and reuses createSanitizeTraceCtx so both surfaces emit identical trace lines.
…arget
mapSummaryToAnthropicDisplay('auto') returns undefined, so the apply
step has always left a user-supplied thinking.display untouched in
that case. The comment now spells out that this is intentional —
'auto' means "defer to upstream default", and operator-locked
overwrite applies to every other summary value.
The cross-protocol service_tier↔speed:'fast' bridge that #114 added to the *-via-messages and messages-via-* translators makes the alias-extension knob anthropicSpeed redundant — operators who want speed: 'fast' on a Messages upstream can set serviceTier: 'fast' on the alias and the bridge handles the wire-level conversion in both directions. Removed before any external client relies on it (the alias schema is not yet public — PR is still open): - ModelAliasRules.anthropicSpeed plus the matching PublicModelAliasedFrom field on /v1/models. - The anthropic_speed Chat / Responses extension fields, the top-level anthropicSpeed Gemini field, and their entries in FLOWAY_EXTENSION_FIELDS. - The four applyAliasRules* branches that wrote the knob into each inbound IR's natural slot, plus the matching emit branches in chat-completions-via-messages, responses-via-messages, and gemini-via-messages translators. - The trace-helper and display/badge formatters that surfaced the field. - All tests asserting either side of the now-removed contract. anthropicBeta is unrelated (Anthropic beta header tokens) and is kept intact. The native Messages `speed` field is also untouched — callers hitting the Messages inbound directly still control it.
# Conflicts: # packages/gateway/src/data-plane/models/load.ts # packages/protocols/src/common/models.ts
Mirror the proxies repo's CRUD shape: `create` rejects PK collisions with
a typed `{ reason: 'duplicate' }` so the route layer can map to 409 without
driver-specific error parsing, `save` upserts in place (preserving the
existing row's createdAt on conflict), `delete` returns whether the row
was removed. `getByAlias` is the targeted lookup the PATCH handler uses to
merge a partial body against the persisted row.
In-memory impl now sorts loadAll by alias to match the SQL `ORDER BY alias`
contract; the Map keyed by alias keeps PK semantics 1:1 with SQLite.
Operator-managed alias rows previously had no admin surface — the only write path was a hand-edited migration. Wire admin-only CRUD next to the existing model-aliases code: GET /api/aliases list, sorted by alias POST /api/aliases create; 409 on PK conflict PATCH /api/aliases/:alias partial update; 404 when missing DELETE /api/aliases/:alias idempotent-shaped 204/404 The Zod schemas mirror the closed rule knob set (reasoning effort / budgetTokens / adaptive / summary, verbosity, serviceTier, anthropicBeta[]) under `.strict()` so an unknown rule key is a 400 — but each value stays freeform: a newly-introduced upstream-side enum ships through without a gateway code change (Goal 2). Alias names are bounded by the same `[A-Za-z0-9_.:-/]+` grammar the real model ids already use. PATCH propagates the absent/null distinction for `displayName` so the operator can clear an operator-set label back to the synthesized fallback without dropping into a separate "reset" route.
Adds the dashboard surface for the new /api/aliases CRUD endpoints:
- useModelAliases composable mirrors the proxies-store pattern (module-
scoped cache, error / loading refs, load()).
- AliasesSettingsCard slots into the Settings page directly under
ProxiesSettingsCard, sharing the glass-card styling and animate-in
delay ordering.
- AliasRow surfaces the alias id, optional display name, target model,
rule badges (sourced from formatAliasRuleBadges so the badge order
matches the rest of the dashboard), and an `on_conflict` chip.
- AliasEditDialog is a single modal for both create and edit. Reasoning
is rendered as a None / Effort / Budget / Adaptive radio + a separate
summary input so the mutually-exclusive wire shape is visible at a
glance. Suggestion hints come from the target model's chat.reasoning
metadata when it matches a real catalog entry, but every value field
stays freeform — Goal 2.
Co-located component-level smoke tests use @vue/test-utils (newly added
as a devDep) plus happy-dom. The dialog tests stub the api client, the
two stores, and reka-ui's portaling Dialog so the form mounts inline
where assertions can reach the inputs and read the posted JSON.
The gateway package's exports map gains two new type-only subpaths
(`./control-plane/model-aliases/serialize`, `./control-plane/model-aliases/types`)
so apps/web can pull `SerializedModelAlias` and `ModelAliasRules` as the
source-of-truth types without crossing the existing deep-import ban.
routes.ts had a re-export of ModelAliasRules that no file imported; the
frontend pulls the type from packages/gateway/src/control-plane/model-aliases/types.ts
directly. The PATCH/DELETE param fallbacks (?? '') were dead — Hono only
dispatches the :alias routes when the segment is present, matching how
api-keys/users routes use param('id')!. repo.ts trimmed verbose
explanatory prose down to the one load-bearing fact. repo/types.ts
'save used by import/restore flows' was stale — only PATCH calls it.
Guard the UPDATE with `AND display_name IS NULL` so a re-run against an environment where the operator already renamed the seed doesn't wipe their value. Migrations are tracked one-shot but defense in depth keeps the local-dev replay path safe.
applyAliasRulesToGemini was writing payload.anthropicBeta but nothing read it — gemini-via-messages doesn't reference the field, and the Messages attempt reads candidate.aliasRules.anthropicBeta directly for the outbound anthropic-beta header. The sanitizer would strip the body field on its way to upstream regardless. Removed the write and the matching test; the header path is unchanged. Also corrected the Messages apply doc that claimed "the write-side validator forbids" adaptive + budgetTokens — the schema accepts both today; the dashboard's tagged radio is what enforces exclusivity, and the apply step picks adaptive when both arrive raw.
`shared/candidates.ts` reduced to a 12-line wrapper that re-exported `ProviderCandidate` from `@floway-dev/provider` and a `ChatCandidate = ProviderCandidate` alias. The rename added no semantic distinction: production chat code already wrote `ChatCandidate` while tests and `@floway-dev/test-utils` already wrote `ProviderCandidate`, leaving the data plane referring to the same shape by two names. Replace every `ChatCandidate` in type position with `ProviderCandidate`, fold the import into each file's existing `@floway-dev/provider` import line (so `import/no-duplicates` stays satisfied), and delete the wrapper. One canonical name across chat code, tests, and shared helpers.
`ChatMetadataEditor.vue` and `ModelEditor.vue` carried the exact same `parseOptionalNumber` helper — blank/null/negative collapse to `undefined`, every other value passes through `Number(raw)`. Same contract on both sides because both editors feed nonnegative integer counts the backend validates identically. Lift into `apps/web/src/utils/parse-optional-number.ts` so the rule has one source of truth.
…s share the providers list `enumerateAddressableModelIds` called `getModels()` and then immediately called `listModelProviders(upstreamFilter)` again — `getModels` already listed providers internally, so the upstreams.list() round-trip and provider-instantiation cost paid twice. Lift the catalog assembly into `getModelsFromProviders(providers, ...)` and let the addressable engine thread the same list into both halves of its walk. `getModels` keeps its old signature as a thin wrapper.
The same five-step ritual ran at the head of every chat serve: catch AliasNoTargetAvailableError → render protocol failure; pull candidates/sawModel/failedUpstreams/aliasResolution off the result; when an alias matched, mutate the payload (or local model var), apply the protocol's chat-rule overlay, and stage the response header. Five serve sites (chat-completions, messages × 2, gemini × 2, responses) each carried the same prose. Extract into chat/shared/alias-prelude.ts. The protocol's mutation plus the rule overlay live in an `applyAlias` callback; the no-target failure renderer is supplied per protocol. The header staging and the 404 conversion stay in the helper so all surfaces agree on the contract — `x-floway-alias` is set for every protocol the moment an alias matches, and `AliasNoTargetAvailableError` always converts to the protocol's `alias-no-target-available` failure shape.
…ocols/common
`apps/web/src/api/types.ts` hand-rolled `PublicModel`, `ChatModelInfo`,
`ModelLimits`, and `ModelEndpointInfo` alongside the canonical
definitions it already imported from `@floway-dev/protocols/common`.
The local `PublicModel` made required fields optional and embedded a
different `endpoints` shape (`Record<string, { url; doc? }>` vs the
canonical `ModelEndpoints` presence map) — silent drift waiting to
happen. Drop the local copies and re-export the canonical types.
`announced-metadata.ts` switches to `PublicModelLimits`. Four test
fixtures move to a shared `api/test-fixtures.ts` that supplies the
required-field defaults `ControlPlaneModel` now demands.
…ged> The wire-input shape was hand-rolled field-by-field, so the next column added to `ModelAlias` required two edits. Derive it directly from `ModelAlias` and strip the server-managed columns (`sort_order` is defaulted by `nextSortOrder`; `created_at` / `updated_at` are stamped by the repo).
`type AliasRules = ChatAliasRules | Record<string, never>` was a union without a runtime discriminator — every consumer narrowed via an unsafe `as ChatAliasRules` cast (a dozen sites across the gateway and the SPA). The empty-object arm is already satisfied by `ChatAliasRules` because every field is optional, so the union added no real safety. Collapse to `type AliasRules = ChatAliasRules` and read fields directly. Also clears the Messages service-tier sibling field (`speed` vs `service_tier`) on every overlay branch so the upstream never sees both with conflicting values.
The constant is transport-level — consumed by the chat alias-prelude and the passthrough seam — not rule-overlay-level. Splitting it off means transport call sites no longer drag in the per-protocol overlay helpers from `apply.ts` to read one string.
The Switch could only emit true/undefined, so an existing record with `adaptive: false` (gateway forwards it to upstream verbatim, the badge formatter renders "non-adaptive") would silently round-trip to `undefined` on first edit. Replace with a Select offering auto / on / off so every state the schema admits stays expressible.
Two divergences in the catalog's alias-window computation: 1. The codex code re-derived `routableIds` via a plain `addressableSet.has(...)` check, which is laxer than `synthesizeListedAliases`'s `kind === alias.kind` predicate. A multi-kind alias misconfigured with off-kind targets would have those targets contribute to the min-window even though the resolver could never pick them. 2. Plain (non-alias) slug lookups went through `slugContextWindow`, a parallel map of the same data already in `addressableById`. Drop the duplicate and read off the same map the alias branch uses. Also realigns the top-of-file doc with the round-1 fix that moved the fallback from `firstRoutable` to `min over routable targets`.
…relude with passthrough `resolveCandidatesAndApplyAlias` used to return the rendered failure before staging the response header, so the alias-no-target 404 lost the `x-floway-alias` correlation an observability tool would tie the client request to. Stage the header inside the catch before rendering the failure — `finalizeGatewayResponse` copies ctx.responseHeaders onto every outbound response, including rendered failures. Same refactor lifts the prelude out of `chat/shared/` into `model-aliases/prelude.ts` and lets `passthrough-serve.ts` import it. The prelude is now generic over the per-protocol target descriptor (chat returns `ChatTargetApi`, passthrough returns `ModelEndpointKey`) and accepts an optional `applyAlias` callback (passthrough leaves it undefined because its body rewrite happens at the provider boundary, not on the inbound payload). The header-staging + 404-conversion dance now lives in one place for both protocol families. `aliasFailureFromError` falls away — the prelude constructs the failure inline because every consumer that ever called it has moved to the new helper.
…atch The resolver dropped targets for two distinct reasons — no enabled upstream binding at all, OR a binding exists but none satisfies the inbound endpoint predicate (kind/endpoint mismatch). The error message collapsed both into the same "no enabled upstream binding" wording, so a chat client hitting an embedding-only alias saw a hint pointing at "no binding" when the real cause was the endpoint. `candidateRoutability` now reports the rejection reason and `AliasNoTargetAvailableError` flips to "none currently serves the inbound endpoint" when every dropped target lost on endpoint-match.
The frontend's hand-written `computeAnnouncedMetadata` mirrors the gateway's `intersectChat` / `intersectLimits`. Round 4 already caught one drift (the `||→&&` modalities fix landed gateway-side first); a test covering the same matrix gives the dashboard's read-only preview + edit-dialog seed buffer a CI gate against the next silent drift.
The four chat protocols (chat-completions, messages, gemini, responses) each catch `AliasNoTargetAvailableError` and render the protocol-specific 404 envelope; the `x-floway-alias` response header is staged on that path so observability can tie "client asked for X" to the rendered 404. None of the four serve_test files exercised the path end to end — the `aliasResolutionQueue` type carried `| Error` but no test ever pushed one. Add one test per protocol so the 404-envelope wiring and the header staging stay locked in.
The component lives in `@floway-dev/ui` and serves any consumer that wants free-form input + a suggestion list. The JSDoc preamble called out alias rule fields specifically and the `borderless` doc pinned "the alias-target row" as the use case; trim both to generic language so the primitive reads as one a future caller can adopt.
apply.ts's `fast`/non-fast branches each delete the sibling field (`body.speed` vs `body.service_tier`) so the upstream never sees both with conflicting values. The existing tests start from a payload that had neither field set, so a future regression deleting the `delete` call would not be caught. Seed each branch's prior-state sibling and assert it's cleared after the overlay.
…header The shared prelude lifts the chat protocols' alias-no-target handling into a generic helper passthrough now reuses. Every chat protocol's serve_test covers the 404 + header path end-to-end; passthrough went through the same code path with no test of its own. Add one that seeds an alias whose target id is not in any upstream catalog, exercises /v1/embeddings, asserts the 404 envelope + `x-floway-alias` header staging, and verifies the upstream is never called on the failure path.
…eriving The shared prelude already knows both the inbound `modelName` and the optional `aliasResolution`, so the "id to use downstream" is a closed computation it can perform once. Body-based protocols (chat-completions/messages/responses) read it implicitly through their own `payload.model = aliasResolution.targetModelId` mutation inside `applyAlias`; path-based Gemini had to re-derive `aliasResolution?.targetModelId ?? args.model` outside the prelude. Lift the derivation into the prelude as `effectiveModelId`. Gemini reads it directly — the `?.` chain at the call site disappears, the asymmetry the auditor flagged between protocols closes, and the next path-based caller gets the field for free.
…min-only) Admin's editor surfaces (alias edit, upstream edit) configure gateway-wide state, not the admin's per-account data-plane view. The default scoped behavior — mirroring the data plane's effective upstream cap — is correct for the Models page and Playground (admin can self-restrict and watch the playground respect it), but wrong for the editor dialogs, which need to see "what exists on the entire gateway" so admin can wire an alias to a target on an upstream the admin's own account is currently restricted out of. Add `gateway_wide=true` to /api/models. Server requires admin and passes `null` to `enumerateAddressableModelIds` (= all upstreams). Non-admin sessions get 403 — the bypass would leak models from upstreams they have no data-plane access to.
The alias editor surfaces (target combobox, shadow detection, kind-mismatch warning, no-target-available warning) configure gateway state, not the admin's per-account data-plane view. A self-restricted admin opening AliasEditDialog used to see a combobox missing every model on upstreams the admin had restricted out — they could not wire an alias to a target the gateway can actually serve. Pass `gateway_wide=true` so the editor sees "what exists" rather than "what this account can reach". The default `useModelsStore` (Models page + Keys page) stays scoped because those surfaces are meant to mirror data-plane visibility.
…mantics The store feeding this check now reads `gateway_wide=true`, so the addressable surface it sees is the entire gateway. "No target currently resolves under your upstream access" was misleading — the check is no longer scoped to the admin's account. Rewrite to "No target resolves to any model on this gateway." which describes the actual cause: a configured target id that no upstream serves.
…teway-wide implicitly Drop the `gateway_wide=true` query param patch. The server now decides gateway-wide vs scoped by the caller's role: admin sessions always receive the full catalog (Models playground + alias edit + settings all share one fetch), non-admin sessions keep their effective-upstream cap. The dashboard then filters client-side per surface: - Alias edit / Settings card / AliasesSettingsCard: no filter, the gateway-wide catalog IS the editor's view of the world. - Models playground: filter by the effective cap of (selectedKey's upstream_ids, admin's own user.upstreamIds). Switching the key in the playground re-narrows the visible models live. Aliases collapse out of the list when no configured target is reachable under the cap. `ModelInfoBar` now takes optional `catalog` + `cap` props and renders "X / N targets reachable" on alias rows in the playground — showing exactly how the resolver would narrow this alias's pool under the chosen key. A new `apps/web/src/utils/reachability.ts` carries the pure helper (`isReachableUnderCap`, `reachableTargets`, `effectiveUpstreamCap`) with a dedicated unit test pinning the cap semantics, alias reachability through targets, and the addressable-but-not-listed case.
…rget check The store query was simplified — `gateway_wide=true` was removed in favor of admin-implicit gateway-wide behavior — but this comment still cited the old URL shape. Update the comment to describe the actual server contract.
The optional alias field on TraceLine was never set by any call site —
SanitizeTraceCtx.emit only ever produces { field, targetProtocol }, so
the slot was unreachable.
The remaining assertEquals lines speak for themselves; the inline note about `endpoints` being surfaced elsewhere was a leftover from an earlier patching pass.
…etadata
PublicModel.limits is a required field; the gateway's mirror at
data-plane/models/alias-listing.ts passes real.limits directly. The
?? {} on the frontend masked what would otherwise be a type-impossible
state, papering over a backend protocol violation instead of surfacing it.
…ty (caller-scoped) The synthesizer used to project both the alias's metadata (limits, chat, endpoints, cost) AND its `aliasedFrom.targets` against the caller's addressable surface, so a non-admin / data-plane caller saw: - (a) a different limit/endpoint set than the admin saw, computed from whichever subset of targets sat inside their cap — the same alias looked different depending on who asked, and a tighter cap could paradoxically widen the published window - (b) the operator's full configured target list — including target ids on upstreams they had no data-plane access to AND typo'd / removed model ids — which is operator state, not their business Split the two axes: - Metadata (limits, chat, endpoints, cost) is now computed against the GATEWAY-WIDE addressable surface. Every caller — admin session, non-admin session, data-plane api key — reads the same numbers for the same alias. Safe-lower-bound holds across the entire gateway, not the caller's subset. - `aliasedFrom.targets` is per-caller. When `narrowTargets=true` (every data-plane call + non-admin control-plane call) only targets the caller can actually reach appear. When `narrowTargets=false` (admin control-plane only) the raw configured list survives so the alias-edit dialog can render typos and out-of-cap targets for fixing. - Alias visibility (whether the row appears in the response at all) stays caller-scoped: at least one target must be reachable under the caller's cap. Threaded through `/v1/models`, `/v1beta/models`, codex 1p catalog, and `/api/models`. Non-admin paths now fetch BOTH the caller-scoped and the gateway-wide addressable surface; admin paths skip the second call because they already are gateway-wide. The SWR cache shares the per-upstream catalog fetches. Tests cover the split (synthesizer-level + control-plane integration: admin sees raw config with typos, non-admin sees narrowed projection, both read identical limits for an alias whose targets disagree on limits across upstreams).
`alias of: 3 / 4 models` when some configured targets are out of the current cap; `alias of: 5 models` when every target is reachable; `alias of: <id-or-display-name>` when only one target is reachable (operator can see exactly what the resolver will pick), and the `selection: <mode>` badge is hidden in that case because there is no selection to make. Replaces the previous "alias of: gpt-5.4, gemini-3-flash-preview, deepseek-v4-pro +1 more" list, which contradicted the parallel "3 / 4 reachable" badge when some of those listed ids were actually out of cap. The list shape also became unwieldy on aliases that fan out across several targets — the count is the information operators actually act on.
… display name The badge mirrors the value the operator typed into the alias's target field and the value a client would put on the wire. Display name is already on the picked target's own row in the sidebar — repeating it here is noise; the id is the actionable identifier.
Alias rows used to show no provider badges (the wire's `upstreams: []` on alias entries leaves nothing to render). Compute the de-duped union across the caller-reachable targets' bindings so the alias info bar surfaces the same provider-badge shape every real-model row does. Each binding is further filtered against the caller's effective cap: a target may sit on three upstreams of which only one is in cap, and only the in-cap one is the provider the resolver would actually route to. This keeps the badges aligned with the parallel "alias of: K / N models" badge — both tell the operator what the resolver will see under the current key, not what's configured gateway-wide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
model_aliasesD1 table + seed (codex-auto-review→gpt-5.4withreasoning.effort=low,real-only). Aliases declare a target model id, an optional upstream filter, an operator-locked rule set, a conflict mode (alias-only/real-only/both-real-first/both-alias-first), and whether the entry appears in/v1/models.thinking_budget,adaptive_thinking,reasoning_summary,anthropic_speed,anthropic_beta,verbosity,serviceTier, etc.) so the same closed set of mode knobs is sayable in any inbound shape. The translate layer reads them and emits the upstream wire's natural slot for each pair; whichever slots don't apply get dropped and traced by the per-upstream sanitizer just before the upstream HTTP call.applyAliasRulesTo<InboundProtocol>runs at the serve layer once the matched candidate is picked, writing alias rule values onto the inbound IR (native slot when the protocol supports the concept, extension slot otherwise). Alias values override user-supplied values — operator-locked semantics per the design's Goal 3./v1/models(and the Anthropic/modelsand Gemini/v1beta/modelssurfaces) emits one entry per (upstream, addressable form) pair that can resolve the alias's target, with analiasedFromFloway extension carrying{ targetModelId, upstreamIds, rules, onConflict }. Aliases markedvisibleInModelsList: falseare omitted; aliases whose target is unreachable from any upstream produce zero entries.x-floway-aliasresponse header carries the matched alias name on every call served via an alias. Stamped via a sharedstageGatewayResponseHeaderhelper that writes both Hono'sc.header(forstreamSSE-built responses) and the per-ctxresponseHeadersbag (for non-streamingResponse.json-built responses), so the header rides out uniformly across streaming, non-streaming, chat, and passthrough paths.Decisions for reviewer attention
reasoning_summaryFloway extension (no effort, budget, or adaptive) and lands on a Messages upstream, the translate layer synthesizesthinking.{type:"enabled", display:<mapped>}rather than leavingthinkingunset. Rationale: Anthropic's wire discardsdisplaywithouttype, so a strict mapping would silently drop the operator's summary intent. Native fields on Responses (reasoning.summary) and Gemini (thinkingConfig.includeThoughts) keep their pre-extension*-via-messagestranslation contracts — the extension synthesis applies only on the new extension paths.verbosity/text.verbosityadded as native fields. The spec described them as native but they were absent from the protocol IR types. Task 3 added them as proper native fields (not Floway extensions), matching the OpenAI GPT-5-family spec.service_tier/reasoning.summary.MessagesPayload.service_tierandResponsesPayload.reasoning.summarywere widened to admit(string & {})so operator-typed freeform values pass through without TypeScript narrowing. Matches Goal 2 (no enum gating in Floway).model_aliases.display_namecolumn + per-upstream prefixed/v1/modelsenumeration. Migration0047adds an optional operator-setdisplay_name. Each alias listing entry is composed as${upstream.displayName}: ${alias.displayName}when set, or${upstream.displayName}: ${target.displayName}${rules summary}otherwise. The rules summary joins per-field tokens (high effort,4096tk reasoning,adaptive reasoning,concise summary,low verbosity,priority tier,fast speed, sortedanthropicBeta/...) in parentheses. The seedcodex-auto-reviewrow getsdisplay_name='Codex Auto Review'. Helpers live incontrol-plane/model-aliases/display.tsand the shared per-upstream emission iterator indata-plane/models/alias-listing.ts.Follow-up fixes (this wave)
responses-via-messagesandgemini-via-messagesno longer translate the Responses-nativereasoning.summary, Responses-nativeservice_tier, or Gemini-nativethinkingConfig.includeThoughtsonto Anthropic Messages — those pairs keep their pre-Task-3 contracts. Extension-field paths (thinking_budget,adaptive_thinking,anthropic_speed, etc.) are unchanged./v1/responsesentry-levelcodex-auto-reviewshim.rewriteResponsesEntryModelAliasis gone. The seed alias in themodel_aliasestable now handles this case through the normal alias-resolution path on every inbound surface. On a Codex upstream that exposes a realcodex-auto-reviewmodel,on_conflict='real-only'lets the real id win and theeffort=lowrule will not apply — Codex CLI callers that needed the shim's behavior must seteffort=lowthemselves or operators can re-seed the row withboth-alias-first.display_name+ per-upstream prefixed enumeration. Described in the decisions section above; the listing change applies to both/v1/modelsand the Gemini/v1beta/modelssurface.Earlier fix wave (final review address)
/v1/embeddings,/v1/images/*, and/v1/completionsthroughfinalizeGatewayResponseso the alias header bag is actually flushed.x-floway-aliasstreaming-safe viastageGatewayResponseHeader(writes to both Hono'sc.headerand the per-ctx bag).gemini_test.tscoverage for the synthetic alias entry in/v1beta/models.INSERTtoINSERT OR IGNOREfor idempotent local dev replays.ORDER BY aliastoloadAllAliasesso the listing is deterministic.Test plan
applyAliasRulesTo<Protocol>writes effort / budget / adaptive / summary / verbosity / serviceTier / anthropicSpeed / anthropicBeta to the right slot for each of the four chat protocols (30 cases inapply_test.ts).onConflictmode emits the expected interpretations (alias-only,real-only,both-real-first,both-alias-first) and thereal-onlypost-resolution prune drops the alias-rewrite half when the real-name also resolves (covered inregistry_test.ts)./v1/modelsemits an alias entry on each (upstream, listed form) pair that can resolve its target (new test asserts bothcodex-auto-reviewandazure/codex-auto-reviewon a dual-listed Azure)./v1/modelsuses${upstream}: ${alias.displayName}whendisplay_nameis set (suppressing the rules summary) and${upstream}: ${target.displayName} (...rules)when it's not./v1/modelshonours aliasupstreamIds— restricting to one upstream emits one row, not one per upstream that happens to expose the target./v1/modelsomits aliases markedvisibleInModelsList: false./v1/modelsomits an alias whose target is not in any reachable upstream catalog (per-upstream rule — there is no surface form to attach the row to)./v1beta/modelsappends visible aliases as synthetic Gemini model entries with the composed${upstream}: ...displayName.x-floway-aliasresponse header is set on an alias-matched request and absent on a real-routed request (Messages http test).x-floway-aliasrides out on/v1/embeddings,/v1/images/generations, and/v1/completionswhen the request hits an aliased model (passthrough tests).reasoning.effortlands onoutput_config.effortof the upstream Messages body (Messages http test).codex-auto-reviewrow withreal-only,reasoning.effort=low, anddisplayName='Codex Auto Review', plus a wall-clockcreatedAt(repo tests).loadAllAliasesreadsdisplay_name, returning it asdisplayNamewhen set and omitting the field when SQL stored NULL.formatAliasRulesSummary/composeAliasDisplayNameunit tests cover each rule field, empty rules, sortedanthropicBeta, alias-displayName-suppresses-rules, and the missing-displayName fallback.pnpm run typecheck && pnpm run lint && pnpm run test— 3485 tests pass.Migration
Before deploying, run the migration and the deploy in one go (idempotent on re-run):
The
model_aliasestable is empty after migration except for the seedcodex-auto-reviewrow. Migration0047adds the optionaldisplay_namecolumn and backfills the seed row's label. Operators populate the table viawrangler d1 executedirect inserts until the CRUD UI lands.Out of scope
POST /api/aliases, etc.). They will share schema and validation with the future UI; building them speculatively wastes work.