Skip to content

feat(codex): /codex/models serves every chat model Floway routes#112

Open
Menci wants to merge 28 commits into
mainfrom
floway/codex-models-catalog-synthesis
Open

feat(codex): /codex/models serves every chat model Floway routes#112
Menci wants to merge 28 commits into
mainfrom
floway/codex-models-catalog-synthesis

Conversation

@Menci

@Menci Menci commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Rewrites Floway's /codex/models endpoint (the catalog Codex CLI consumes for its /model picker) to surface every chat model the gateway can route, not just the slugs that overlap with the upstream bundled Codex catalog. Models whose public id segment-matches a bundled slug reuse the bundled entry verbatim with slug/display_name/context_window overridden; the rest get synthesized entries built from the registry's per-model metadata (chat?: first-class field).

/codex/models rewrite

computeCatalog now iterates the registry. For each chat-kind model:

  • Lowercased public id split by / and :; first segment matching a bundled slug → reuse bundled entry, override slug, display_name, and context_window
  • No match → synthesize via synthesizeCatalogEntry using the registry's chat?.modalities / chat?.reasoning / limits / cost.tiers
  • codex-auto-review alias appears only when the literal gpt-5.4 is in the registry (matches existing data-plane resolver semantics)

Worker-side cache continues at 5 min per (client_version, upstream filter); a cold deploy plus an active session pays the slow path at most once per colo.

Synthesized entries

  • base_instructions vendored from openai/codex models.json (gpt-5.5 entry) with one line edited for provider neutrality: "based on GPT-5""running in the Codex CLI". Apache-2.0 attribution + commit-pinned permalink.
  • service_tiers derived from each model's cost.tiers keys (so e.g. claude-opus-4-8's tiers.fast pricing surface translates into a /fast toggle in the picker).
  • supported_reasoning_levels + default_reasoning_level projected from chat.reasoning.effort when present (Codex CLI's catalog wire is effort-only; budget_tokens / adaptive / mandatory sub-modes silently dropped at the wire boundary).
  • input_modalities derived from chat.modalities.input; downstream web_search_tool_type and supports_image_detail_original derived in turn.
  • Hardcoded safe defaults for everything else (shell_type, prefer_websockets, apply_patch_tool_type, etc.).

Bundled-reuse details

Bundled entries pass through their original priority, description, support_verbosity, etc. Only three fields are overridden from the registry:

  • slug → public model id (so the CLI sends the right slug back)
  • display_name → registry's display_name (operator-controlled label)
  • context_window / max_context_window → registry's limits.max_context_window_tokens
  • service_tiers → derived from registry cost.tiers (replaces bundled; bundled today emits [] so usually a no-op)

Operator chat?: overrides on a registry model do NOT apply to matched-bundled slugs (deliberate — bundled is the upstream catalog's authoritative shape).

Depends on

This PR depends on per-model chat?: metadata (PR #115). Base is set to floway/chat-metadata until that merges; will rebase to main after.

Limitations (deliberate)

  • Bundled-reuse over-trust on namespace-colliding slugs: ollama/gpt-5.5 (hypothetical) inherits chatgpt.com gpt-5.5 metadata. Operator override silently ignored for matched slugs.
  • 5 min per-colo cache (pre-existing): operator additions take up to 5 min to surface in the picker.

Test plan

  • pnpm run test — 3478 tests passing
  • pnpm run lint — clean
  • pnpm run typecheck — clean across gateway / platform-cloudflare / platform-node / web
  • Headless smoke: /codex/models synthesis (deepseek-v4-pro), bundled match (openrouter/gpt-5.5:nitro), service_tiers derivation
  • Manual: Codex CLI /model picker against this deploy — every chat model visible; reasoning effort selector for effort-bearing models; /fast toggle for claude-opus-4-{6,7,8} (requires PR feat(translate): bridge service_tier:fast ↔ speed:fast across OpenAI/Anthropic translators #114's translator)

Menci added 8 commits June 25, 2026 16:02
…ex catalog wire is effort-only)

Adds a comment at the reasoning-mapping site in synthesize.ts explaining
that Codex CLI's catalog wire only supports effort-tiered reasoning
(supported_reasoning_levels + default_reasoning_level per openai_models.rs
ModelInfo), so budget_tokens, adaptive, and mandatory are silently dropped.
The omission is benign: Codex CLI sends reasoning.effort from the global
default and the translation layer maps it to the appropriate upstream
representation at request time.

Extends synthesize_test.ts with four new tests verifying the silent drop
of budget_tokens-only, adaptive-only, and mandatory-only reasoning configs,
and that effort wins when combined with adaptive.
…default

Synthesized Codex catalog entries used to ship with base_instructions: ''.
That left non-bundled chat models (DeepSeek, Claude variants via Custom, etc.)
without any system prompt, even though Codex CLI hands them tools like
update_plan/apply_patch/shell that need contextual explanation.

Adopt openai/codex's gpt-5.5 base_instructions verbatim, with one edit:
'based on GPT-5' becomes 'running in the Codex CLI' so the prompt reads
correctly for any provider. The rest of the prompt is provider-neutral
(covers Codex CLI personality, AGENTS.md spec, planning, harmony channels,
preamble messages, apply_patch grammar).
…ng.tiers

Previously the synthesized and bundled-reuse catalog paths both emitted
service_tiers: []. The Codex CLI gates its /fast toggle and
service_tier: "flex"|"priority"|"fast" on this field, so models that
have billable tiers in the registry (e.g. claude-opus-4-8 with fast,
gpt-5.4 with flex/priority) were silently dropping the toggle.

Both paths now compute Object.keys(model.cost?.tiers ?? {}).map(id =>
({ id, name: id, description: '' })) so the CLI surfaces the toggle
exactly when the operator has pricing data for a tier.

For bundled reuse, registry-derived tiers unconditionally replace the
bundled service_tiers — bundled chatgpt.com entries today emit [] but if
they ever advertised tiers we lack pricing for, we'd expose a knob we
can't bill.
@Menci Menci force-pushed the floway/codex-models-catalog-synthesis branch from 3b7f931 to 39fe698 Compare June 25, 2026 18:10
@Menci Menci changed the base branch from main to floway/chat-metadata June 25, 2026 18:10
@Menci Menci changed the title feat(codex): /codex/models serves every chat model with bundled-slug matching feat(codex): /codex/models serves every chat model Floway routes Jun 25, 2026
@Menci Menci force-pushed the floway/codex-models-catalog-synthesis branch from 39fe698 to 0dd3fa4 Compare June 25, 2026 20:02
@Menci Menci changed the base branch from floway/chat-metadata to main June 25, 2026 20:02
Menci added 17 commits June 26, 2026 04:13
…x entry

`codex-rs/protocol/src/openai_models.rs`'s `ModelInfo` struct requires
`supports_reasoning_summaries: bool` and `apply_patch_tool_type:
Option<ApplyPatchToolType>` to be present. Absence aborts deserialization
of the whole `/models` body inside Codex CLI, which then silently falls
back to its bundled catalog — wiping out every synthesized entry.

The bundled vendored snapshot already carries both keys; the synthesizer
was building a strict subset. Emit `supports_reasoning_summaries: false`
and `apply_patch_tool_type: null` so the synthesized entries actually
reach the picker.
…els pipeline

Drop comments that restate the code or apologize for shape:
- `computeCatalog`'s "tests can call this directly" framing — the split
  stands on its own (request handler does I/O, computeCatalog does
  catalog shape); no need to justify it through the test lens.
- `// slug stays as alias literal` on a plain spread that obviously
  keeps the slug field.
- "— which is the right UX" editorial closer at the end of the
  reasoning-projection rationale; the preceding evidence already
  establishes that the omission is benign.
- "Bundled reuse: clone, override slug + display_name..." restated the
  next two lines; keep only the non-obvious "Codex picker keys on
  public id, not bundled segment" bit.
…nth payload

`codex-rs/protocol/src/openai_models.rs`'s `ModelInfo` struct has no
`reasoning_summary_format` field; the real field is
`default_reasoning_summary: ReasoningSummary` (already emitted). serde
without `deny_unknown_fields` silently drops the key, so the synth
emission was harmless but misleading — bundled.json carries it as
upstream residue, and mirroring residue from a hand-rolled synthesizer
makes the contract less, not more, legible.
… covers it)

The pipeline paragraph in the file header already explains slug override
("Bundled entries have their slug overridden to the registry public id
..."); the inline near-restatement adds no new constraint or evidence.
…emits tiers)

The replaced comment claimed 'bundled chatgpt.com today emits
service_tiers: [] anyway', but the vendored bundled.json carries a
non-empty `priority` tier on gpt-5.5 and gpt-5.4. The policy stays the
same — registry-derived tiers replace bundled — but the justification
now states the real reason directly: a tier without registry unit
prices cannot be billed and so must not be surfaced.
…ntry

Bundled.json is a shipped data file; missing the codex-auto-review entry
when the alias would otherwise be appended is a build-time data
regression, not a runtime fallback case. Silently dropping the alias
left the auto-review hook misconfigured in the catalog without surfacing
the regression.
…logModel

The literal already satisfies CatalogModel's { slug: string; [key: string]: unknown }
shape, so the call site no longer needs an unchecked cast.
Both the synthesizer and the bundled-reuse path mirrored the same
id->{id, name, description: ''} projection from cost.tiers. Lifting it
into a single named helper makes the id->name mirror + blank-description
contract visible in one place.
Merge the second pass over internalModels (which built the slug→context-window map) into the main chat-filter loop. Only chat-kind models contribute to the map now, which matches the only callers of the resolver. Drop the explanatory comment above computeCatalog that restated the code shape.

One O(n) loop instead of two; non-chat entries previously added to the map were never looked up by applyContextWindowFromRegistry (it iterates catalog.models, which only contains chat models plus the alias).
The PR-introduced flag reordering moved demote-developer-to-system into
the slot the comment block was anchoring; the comment is about the
billing attribution header, not the developer-role demotion. Move it
back to sit directly above strip-billing-attribution.
synthesizeCatalogEntry stops writing context_window / max_context_window
directly. applyContextWindowFromRegistry in context-window.ts is now the
sole writer for the pair, with a CONSERVATIVE_DEFAULT_CONTEXT_WINDOW
(128k) backing any slug whose registry entry omits
max_context_window_tokens.

Previously: synthesize.ts wrote the registry value (if any) and
applyContextWindowFromRegistry overwrote it with the same map lookup —
two writers tracking the same source of truth. Slugs whose registry
omits the limit had no field at all, which crashes codex's
`(context_window * 9) / 10` auto-compact trigger inside
ModelInfo::auto_compact_token_limit.

After: synthesize emits the entry with both fields absent; downstream
applyContextWindowFromRegistry sets them from the resolver, falling
back to 128k when the resolver returns null. The default is deliberately
low — an operator who needs more configures the registry.

This also changes bundled-hit behaviour for slugs whose registry has no
limit: the bundled-vendored OpenAI value no longer leaks through. The
project rule is already that the registry is the only source of truth
for what the gateway can actually serve (see context-window.ts header
comment); the new behaviour enforces it consistently.
The codex-auto-review alias entry was being appended verbatim from the
bundled catalog ({...aliasEntry}). The bundled service_tiers reflect
OpenAI 1p's tiering, which Floway may not be able to bill — the same
reason the bundled-hit path rewrites service_tiers from the target's
registry cost.tiers (a tier we can bill must have unit prices in the
registry).

Track the alias target's InternalModel through the chat-model loop and
feed it to deriveServiceTiers when appending the alias. Two new tests
pin (a) tiers projected from the target's cost.tiers and (b) [] when
the target has no tiers.
The previous commit moved the 128k context_window fallback into
applyContextWindowFromRegistry, which made the default apply uniformly
to bundled-hit slugs too — overwriting the bundled-vendored window
whenever the registry happened not to advertise one.

Bundled-hit entries have a real upstream-vendored window from the codex
catalog; that value is a fine fallback when the registry has none, and
silently rewriting it to 128k regresses operators who had not yet
configured max_context_window_tokens. Keep applyContextWindowFromRegistry's
original 'pass through unchanged when resolver returns null' behaviour,
and instead apply the 128k default inside synthesizeCatalogEntry —
synth entries are the only ones that would otherwise emit no
context_window field at all. Move the CONSERVATIVE_DEFAULT_CONTEXT_WINDOW
constant alongside its sole consumer.
The bundle-match path leans on `MODEL_PREFIX_REGEX` requiring a trailing
slash on every prefix — without that, the segment splitter would not be
guaranteed to land on the unprefixed leaf. Lock the invariant with a
deep-nested prefix case so a future prefix-regex relaxation surfaces
here.
Menci added 3 commits June 28, 2026 01:32
display_name is the only bundled-inherited field we keep on undefined —
service_tiers / context_window have registry-driven fallbacks (they tie
to billing / runtime gating), but the UI label is fine to inherit from
bundled. Lock the asymmetry so a future refactor that aligns the three
fields' fallback policies hits this test first.
The Workers Cache API wrapper around codex /models was a latency hedge:
codex aborts its catalog refresh after 5 s, the registry leg can cost
~4 s, and the cache let the slow path run at most once per (colo,
client_version, upstream filter) per 5-min window.

The downside is a 5-min staleness window: operator changes to an
upstream's catalog do not surface in the codex picker until the cache
expires, and there is no easy way to invalidate from the dashboard.
That tradeoff is wrong for an interactive operator surface — fresh data
matters more than the spare worst-case second.

Strip the cache wrapper, drop CACHE_TTL_SECONDS / cacheKeyFor /
parseCodexVersion import, drop the cache-control: max-age header on the
response (a stale codex picker is harder to debug than one extra hit
per CLI launch). Provider-side model lists keep their own SWR caching in
models-cache.ts — that layer is intentional and unchanged.

The existing cache test is repurposed to a regression guard: it stubs
caches.default with match/put spies and asserts both stay at zero, so a
future reintroduction of caching at this layer fails loudly.
After the cache wrapper was removed, the file header still carried a
paragraph framing the absence ("no per-colo cache", codex's 5s timeout)
and the test suite still pinned 'caches.default is untouched' as a
guard. Both only make sense if a reader expects there to be a cache —
i.e. they preserve the shape of a feature that no longer exists. Drop
them so the code reads as if the catalog endpoint had always been a
direct registry passthrough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant