feat(codex): /codex/models serves every chat model Floway routes by Menci · Pull Request #112 · Menci/Floway

Menci · 2026-06-25T09:41:57Z

Summary

Rewrites Floway's /codex/models endpoint (the catalog Codex CLI consumes for its /model picker) to surface every chat model the gateway can route, not just the slugs that overlap with the upstream bundled Codex catalog. Models whose public id segment-matches a bundled slug reuse the bundled entry verbatim with slug/display_name/context_window overridden; the rest get synthesized entries built from the registry's per-model metadata (chat?: first-class field).

/codex/models rewrite

computeCatalog now iterates the registry. For each chat-kind model:

Lowercased public id split by / and :; first segment matching a bundled slug → reuse bundled entry, override slug, display_name, and context_window
No match → synthesize via synthesizeCatalogEntry using the registry's chat?.modalities / chat?.reasoning / limits / cost.tiers
codex-auto-review alias appears only when the literal gpt-5.4 is in the registry (matches existing data-plane resolver semantics)

Worker-side cache continues at 5 min per (client_version, upstream filter); a cold deploy plus an active session pays the slow path at most once per colo.

Synthesized entries

base_instructions vendored from openai/codex models.json (gpt-5.5 entry) with one line edited for provider neutrality: "based on GPT-5" → "running in the Codex CLI". Apache-2.0 attribution + commit-pinned permalink.
service_tiers derived from each model's cost.tiers keys (so e.g. claude-opus-4-8's tiers.fast pricing surface translates into a /fast toggle in the picker).
supported_reasoning_levels + default_reasoning_level projected from chat.reasoning.effort when present (Codex CLI's catalog wire is effort-only; budget_tokens / adaptive / mandatory sub-modes silently dropped at the wire boundary).
input_modalities derived from chat.modalities.input; downstream web_search_tool_type and supports_image_detail_original derived in turn.
Hardcoded safe defaults for everything else (shell_type, prefer_websockets, apply_patch_tool_type, etc.).

Bundled-reuse details

Bundled entries pass through their original priority, description, support_verbosity, etc. Only three fields are overridden from the registry:

slug → public model id (so the CLI sends the right slug back)
display_name → registry's display_name (operator-controlled label)
context_window / max_context_window → registry's limits.max_context_window_tokens
service_tiers → derived from registry cost.tiers (replaces bundled; bundled today emits [] so usually a no-op)

Operator chat?: overrides on a registry model do NOT apply to matched-bundled slugs (deliberate — bundled is the upstream catalog's authoritative shape).

Depends on

This PR depends on per-model chat?: metadata (PR #115). Base is set to floway/chat-metadata until that merges; will rebase to main after.

Limitations (deliberate)

Bundled-reuse over-trust on namespace-colliding slugs: ollama/gpt-5.5 (hypothetical) inherits chatgpt.com gpt-5.5 metadata. Operator override silently ignored for matched slugs.
5 min per-colo cache (pre-existing): operator additions take up to 5 min to surface in the picker.

Test plan

pnpm run test — 3478 tests passing
pnpm run lint — clean
pnpm run typecheck — clean across gateway / platform-cloudflare / platform-node / web
Headless smoke: /codex/models synthesis (deepseek-v4-pro), bundled match (openrouter/gpt-5.5:nitro), service_tiers derivation
Manual: Codex CLI /model picker against this deploy — every chat model visible; reasoning effort selector for effort-bearing models; /fast toggle for claude-opus-4-{6,7,8} (requires PR feat(translate): bridge service_tier:fast ↔ speed:fast across OpenAI/Anthropic translators #114's translator)

…models

… Modality, extend test)

…g against bundled catalog

…ex catalog wire is effort-only) Adds a comment at the reasoning-mapping site in synthesize.ts explaining that Codex CLI's catalog wire only supports effort-tiered reasoning (supported_reasoning_levels + default_reasoning_level per openai_models.rs ModelInfo), so budget_tokens, adaptive, and mandatory are silently dropped. The omission is benign: Codex CLI sends reasoning.effort from the global default and the translation layer maps it to the appropriate upstream representation at request time. Extends synthesize_test.ts with four new tests verifying the silent drop of budget_tokens-only, adaptive-only, and mandatory-only reasoning configs, and that effort wins when combined with adaptive.

…op test-concern suffix)

…default Synthesized Codex catalog entries used to ship with base_instructions: ''. That left non-bundled chat models (DeepSeek, Claude variants via Custom, etc.) without any system prompt, even though Codex CLI hands them tools like update_plan/apply_patch/shell that need contextual explanation. Adopt openai/codex's gpt-5.5 base_instructions verbatim, with one edit: 'based on GPT-5' becomes 'running in the Codex CLI' so the prompt reads correctly for any provider. The rest of the prompt is provider-neutral (covers Codex CLI personality, AGENTS.md spec, planning, harmony channels, preamble messages, apply_patch grammar).

…ng.tiers Previously the synthesized and bundled-reuse catalog paths both emitted service_tiers: []. The Codex CLI gates its /fast toggle and service_tier: "flex"|"priority"|"fast" on this field, so models that have billable tiers in the registry (e.g. claude-opus-4-8 with fast, gpt-5.4 with flex/priority) were silently dropping the toggle. Both paths now compute Object.keys(model.cost?.tiers ?? {}).map(id => ({ id, name: id, description: '' })) so the CLI surfaces the toggle exactly when the operator has pricing data for a tier. For bundled reuse, registry-derived tiers unconditionally replace the bundled service_tiers — bundled chatgpt.com entries today emit [] but if they ever advertised tiers we lack pricing for, we'd expose a knob we can't bill.

…xture

…x entry `codex-rs/protocol/src/openai_models.rs`'s `ModelInfo` struct requires `supports_reasoning_summaries: bool` and `apply_patch_tool_type: Option<ApplyPatchToolType>` to be present. Absence aborts deserialization of the whole `/models` body inside Codex CLI, which then silently falls back to its bundled catalog — wiping out every synthesized entry. The bundled vendored snapshot already carries both keys; the synthesizer was building a strict subset. Emit `supports_reasoning_summaries: false` and `apply_patch_tool_type: null` so the synthesized entries actually reach the picker.

…els pipeline Drop comments that restate the code or apologize for shape: - `computeCatalog`'s "tests can call this directly" framing — the split stands on its own (request handler does I/O, computeCatalog does catalog shape); no need to justify it through the test lens. - `// slug stays as alias literal` on a plain spread that obviously keeps the slug field. - "— which is the right UX" editorial closer at the end of the reasoning-projection rationale; the preceding evidence already establishes that the omission is benign. - "Bundled reuse: clone, override slug + display_name..." restated the next two lines; keep only the non-obvious "Codex picker keys on public id, not bundled segment" bit.

…nth payload `codex-rs/protocol/src/openai_models.rs`'s `ModelInfo` struct has no `reasoning_summary_format` field; the real field is `default_reasoning_summary: ReasoningSummary` (already emitted). serde without `deny_unknown_fields` silently drops the key, so the synth emission was harmless but misleading — bundled.json carries it as upstream residue, and mirroring residue from a hand-rolled synthesizer makes the contract less, not more, legible.

… covers it) The pipeline paragraph in the file header already explains slug override ("Bundled entries have their slug overridden to the registry public id ..."); the inline near-restatement adds no new constraint or evidence.

…emits tiers) The replaced comment claimed 'bundled chatgpt.com today emits service_tiers: [] anyway', but the vendored bundled.json carries a non-empty `priority` tier on gpt-5.5 and gpt-5.4. The policy stays the same — registry-derived tiers replace bundled — but the justification now states the real reason directly: a tier without registry unit prices cannot be billed and so must not be surfaced.

…ntry Bundled.json is a shipped data file; missing the codex-auto-review entry when the alias would otherwise be appended is a build-time data regression, not a runtime fallback case. Silently dropping the alias left the auto-review hook misconfigured in the catalog without surfacing the regression.

…logModel The literal already satisfies CatalogModel's { slug: string; [key: string]: unknown } shape, so the call site no longer needs an unchecked cast.

Both the synthesizer and the bundled-reuse path mirrored the same id->{id, name, description: ''} projection from cost.tiers. Lifting it into a single named helper makes the id->name mirror + blank-description contract visible in one place.

…atalog-synthesis

Merge the second pass over internalModels (which built the slug→context-window map) into the main chat-filter loop. Only chat-kind models contribute to the map now, which matches the only callers of the resolver. Drop the explanatory comment above computeCatalog that restated the code shape. One O(n) loop instead of two; non-chat entries previously added to the map were never looked up by applyContextWindowFromRegistry (it iterates catalog.models, which only contains chat models plus the alias).

…atalog-synthesis

The PR-introduced flag reordering moved demote-developer-to-system into the slot the comment block was anchoring; the comment is about the billing attribution header, not the developer-role demotion. Move it back to sit directly above strip-billing-attribution.

synthesizeCatalogEntry stops writing context_window / max_context_window directly. applyContextWindowFromRegistry in context-window.ts is now the sole writer for the pair, with a CONSERVATIVE_DEFAULT_CONTEXT_WINDOW (128k) backing any slug whose registry entry omits max_context_window_tokens. Previously: synthesize.ts wrote the registry value (if any) and applyContextWindowFromRegistry overwrote it with the same map lookup — two writers tracking the same source of truth. Slugs whose registry omits the limit had no field at all, which crashes codex's `(context_window * 9) / 10` auto-compact trigger inside ModelInfo::auto_compact_token_limit. After: synthesize emits the entry with both fields absent; downstream applyContextWindowFromRegistry sets them from the resolver, falling back to 128k when the resolver returns null. The default is deliberately low — an operator who needs more configures the registry. This also changes bundled-hit behaviour for slugs whose registry has no limit: the bundled-vendored OpenAI value no longer leaks through. The project rule is already that the registry is the only source of truth for what the gateway can actually serve (see context-window.ts header comment); the new behaviour enforces it consistently.

The codex-auto-review alias entry was being appended verbatim from the bundled catalog ({...aliasEntry}). The bundled service_tiers reflect OpenAI 1p's tiering, which Floway may not be able to bill — the same reason the bundled-hit path rewrites service_tiers from the target's registry cost.tiers (a tier we can bill must have unit prices in the registry). Track the alias target's InternalModel through the chat-model loop and feed it to deriveServiceTiers when appending the alias. Two new tests pin (a) tiers projected from the target's cost.tiers and (b) [] when the target has no tiers.

The previous commit moved the 128k context_window fallback into applyContextWindowFromRegistry, which made the default apply uniformly to bundled-hit slugs too — overwriting the bundled-vendored window whenever the registry happened not to advertise one. Bundled-hit entries have a real upstream-vendored window from the codex catalog; that value is a fine fallback when the registry has none, and silently rewriting it to 128k regresses operators who had not yet configured max_context_window_tokens. Keep applyContextWindowFromRegistry's original 'pass through unchanged when resolver returns null' behaviour, and instead apply the 128k default inside synthesizeCatalogEntry — synth entries are the only ones that would otherwise emit no context_window field at all. Move the CONSERVATIVE_DEFAULT_CONTEXT_WINDOW constant alongside its sole consumer.

The bundle-match path leans on `MODEL_PREFIX_REGEX` requiring a trailing slash on every prefix — without that, the segment splitter would not be guaranteed to land on the unprefixed leaf. Lock the invariant with a deep-nested prefix case so a future prefix-regex relaxation surfaces here.

display_name is the only bundled-inherited field we keep on undefined — service_tiers / context_window have registry-driven fallbacks (they tie to billing / runtime gating), but the UI label is fine to inherit from bundled. Lock the asymmetry so a future refactor that aligns the three fields' fallback policies hits this test first.

The Workers Cache API wrapper around codex /models was a latency hedge: codex aborts its catalog refresh after 5 s, the registry leg can cost ~4 s, and the cache let the slow path run at most once per (colo, client_version, upstream filter) per 5-min window. The downside is a 5-min staleness window: operator changes to an upstream's catalog do not surface in the codex picker until the cache expires, and there is no easy way to invalidate from the dashboard. That tradeoff is wrong for an interactive operator surface — fresh data matters more than the spare worst-case second. Strip the cache wrapper, drop CACHE_TTL_SECONDS / cacheKeyFor / parseCodexVersion import, drop the cache-control: max-age header on the response (a stale codex picker is harder to debug than one extra hit per CLI launch). Provider-side model lists keep their own SWR caching in models-cache.ts — that layer is intentional and unchanged. The existing cache test is repurposed to a regression guard: it stubs caches.default with match/put spies and asserts both stay at zero, so a future reintroduction of caching at this layer fails loudly.

After the cache wrapper was removed, the file header still carried a paragraph framing the absence ("no per-colo cache", codex's 5s timeout) and the test suite still pinned 'caches.default is untouched' as a guard. Both only make sense if a reader expects there to be a cache — i.e. they preserve the shape of a feature that no longer exists. Drop them so the code reads as if the catalog endpoint had always been a direct registry passthrough.

Menci added 8 commits June 25, 2026 16:02

feat(gateway): synthesize Codex catalog entries for non-bundled chat …

173d2b0

…models

fix(gateway): tighten synthesize.ts (drop forward ref, reuse provider…

b529b21

… Modality, extend test)

feat(gateway): /codex/models serves all chat models with slug matchin…

9f94072

…g against bundled catalog

refactor(gateway): rename computeCatalogForTest to computeCatalog (dr…

d0fe40e

…op test-concern suffix)

fix(gateway): use discriminated reasoning shape in synthesize_test fi…

0dd3fa4

…xture

Menci force-pushed the floway/codex-models-catalog-synthesis branch from 3b7f931 to 39fe698 Compare June 25, 2026 18:10

Menci changed the base branch from main to floway/chat-metadata June 25, 2026 18:10

Menci changed the title ~~feat(codex): /codex/models serves every chat model with bundled-slug matching~~ feat(codex): /codex/models serves every chat model Floway routes Jun 25, 2026

Menci force-pushed the floway/codex-models-catalog-synthesis branch from 39fe698 to 0dd3fa4 Compare June 25, 2026 20:02

Menci changed the base branch from floway/chat-metadata to main June 25, 2026 20:02

Menci added 17 commits June 26, 2026 04:13

refactor(gateway): inline single-use truncation literal in synthesize.ts

e664620

refactor(gateway): tighten synthesizeCatalogEntry return type to Cata…

4f57bbd

…logModel The literal already satisfies CatalogModel's { slug: string; [key: string]: unknown } shape, so the call site no longer needs an unchecked cast.

Merge remote-tracking branch 'origin/main' into floway/codex-models-c…

1de860f

…atalog-synthesis

Merge remote-tracking branch 'origin/main' into floway/codex-models-c…

e80e7d1

…atalog-synthesis

Menci added 3 commits June 28, 2026 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codex): /codex/models serves every chat model Floway routes#112

feat(codex): /codex/models serves every chat model Floway routes#112
Menci wants to merge 28 commits into
mainfrom
floway/codex-models-catalog-synthesis

Menci commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Menci commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

/codex/models rewrite

Synthesized entries

Bundled-reuse details

Depends on

Limitations (deliberate)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Menci commented Jun 25, 2026 •

edited

Loading