feat(codex): /codex/models serves every chat model Floway routes#112
Open
Menci wants to merge 28 commits into
Open
feat(codex): /codex/models serves every chat model Floway routes#112Menci wants to merge 28 commits into
Menci wants to merge 28 commits into
Conversation
… Modality, extend test)
…g against bundled catalog
…ex catalog wire is effort-only) Adds a comment at the reasoning-mapping site in synthesize.ts explaining that Codex CLI's catalog wire only supports effort-tiered reasoning (supported_reasoning_levels + default_reasoning_level per openai_models.rs ModelInfo), so budget_tokens, adaptive, and mandatory are silently dropped. The omission is benign: Codex CLI sends reasoning.effort from the global default and the translation layer maps it to the appropriate upstream representation at request time. Extends synthesize_test.ts with four new tests verifying the silent drop of budget_tokens-only, adaptive-only, and mandatory-only reasoning configs, and that effort wins when combined with adaptive.
…op test-concern suffix)
…default Synthesized Codex catalog entries used to ship with base_instructions: ''. That left non-bundled chat models (DeepSeek, Claude variants via Custom, etc.) without any system prompt, even though Codex CLI hands them tools like update_plan/apply_patch/shell that need contextual explanation. Adopt openai/codex's gpt-5.5 base_instructions verbatim, with one edit: 'based on GPT-5' becomes 'running in the Codex CLI' so the prompt reads correctly for any provider. The rest of the prompt is provider-neutral (covers Codex CLI personality, AGENTS.md spec, planning, harmony channels, preamble messages, apply_patch grammar).
…ng.tiers
Previously the synthesized and bundled-reuse catalog paths both emitted
service_tiers: []. The Codex CLI gates its /fast toggle and
service_tier: "flex"|"priority"|"fast" on this field, so models that
have billable tiers in the registry (e.g. claude-opus-4-8 with fast,
gpt-5.4 with flex/priority) were silently dropping the toggle.
Both paths now compute Object.keys(model.cost?.tiers ?? {}).map(id =>
({ id, name: id, description: '' })) so the CLI surfaces the toggle
exactly when the operator has pricing data for a tier.
For bundled reuse, registry-derived tiers unconditionally replace the
bundled service_tiers — bundled chatgpt.com entries today emit [] but if
they ever advertised tiers we lack pricing for, we'd expose a knob we
can't bill.
3b7f931 to
39fe698
Compare
39fe698 to
0dd3fa4
Compare
…x entry `codex-rs/protocol/src/openai_models.rs`'s `ModelInfo` struct requires `supports_reasoning_summaries: bool` and `apply_patch_tool_type: Option<ApplyPatchToolType>` to be present. Absence aborts deserialization of the whole `/models` body inside Codex CLI, which then silently falls back to its bundled catalog — wiping out every synthesized entry. The bundled vendored snapshot already carries both keys; the synthesizer was building a strict subset. Emit `supports_reasoning_summaries: false` and `apply_patch_tool_type: null` so the synthesized entries actually reach the picker.
…els pipeline Drop comments that restate the code or apologize for shape: - `computeCatalog`'s "tests can call this directly" framing — the split stands on its own (request handler does I/O, computeCatalog does catalog shape); no need to justify it through the test lens. - `// slug stays as alias literal` on a plain spread that obviously keeps the slug field. - "— which is the right UX" editorial closer at the end of the reasoning-projection rationale; the preceding evidence already establishes that the omission is benign. - "Bundled reuse: clone, override slug + display_name..." restated the next two lines; keep only the non-obvious "Codex picker keys on public id, not bundled segment" bit.
…nth payload `codex-rs/protocol/src/openai_models.rs`'s `ModelInfo` struct has no `reasoning_summary_format` field; the real field is `default_reasoning_summary: ReasoningSummary` (already emitted). serde without `deny_unknown_fields` silently drops the key, so the synth emission was harmless but misleading — bundled.json carries it as upstream residue, and mirroring residue from a hand-rolled synthesizer makes the contract less, not more, legible.
… covers it)
The pipeline paragraph in the file header already explains slug override
("Bundled entries have their slug overridden to the registry public id
..."); the inline near-restatement adds no new constraint or evidence.
…emits tiers) The replaced comment claimed 'bundled chatgpt.com today emits service_tiers: [] anyway', but the vendored bundled.json carries a non-empty `priority` tier on gpt-5.5 and gpt-5.4. The policy stays the same — registry-derived tiers replace bundled — but the justification now states the real reason directly: a tier without registry unit prices cannot be billed and so must not be surfaced.
…ntry Bundled.json is a shipped data file; missing the codex-auto-review entry when the alias would otherwise be appended is a build-time data regression, not a runtime fallback case. Silently dropping the alias left the auto-review hook misconfigured in the catalog without surfacing the regression.
…logModel
The literal already satisfies CatalogModel's { slug: string; [key: string]: unknown }
shape, so the call site no longer needs an unchecked cast.
Both the synthesizer and the bundled-reuse path mirrored the same
id->{id, name, description: ''} projection from cost.tiers. Lifting it
into a single named helper makes the id->name mirror + blank-description
contract visible in one place.
Merge the second pass over internalModels (which built the slug→context-window map) into the main chat-filter loop. Only chat-kind models contribute to the map now, which matches the only callers of the resolver. Drop the explanatory comment above computeCatalog that restated the code shape. One O(n) loop instead of two; non-chat entries previously added to the map were never looked up by applyContextWindowFromRegistry (it iterates catalog.models, which only contains chat models plus the alias).
The PR-introduced flag reordering moved demote-developer-to-system into the slot the comment block was anchoring; the comment is about the billing attribution header, not the developer-role demotion. Move it back to sit directly above strip-billing-attribution.
synthesizeCatalogEntry stops writing context_window / max_context_window directly. applyContextWindowFromRegistry in context-window.ts is now the sole writer for the pair, with a CONSERVATIVE_DEFAULT_CONTEXT_WINDOW (128k) backing any slug whose registry entry omits max_context_window_tokens. Previously: synthesize.ts wrote the registry value (if any) and applyContextWindowFromRegistry overwrote it with the same map lookup — two writers tracking the same source of truth. Slugs whose registry omits the limit had no field at all, which crashes codex's `(context_window * 9) / 10` auto-compact trigger inside ModelInfo::auto_compact_token_limit. After: synthesize emits the entry with both fields absent; downstream applyContextWindowFromRegistry sets them from the resolver, falling back to 128k when the resolver returns null. The default is deliberately low — an operator who needs more configures the registry. This also changes bundled-hit behaviour for slugs whose registry has no limit: the bundled-vendored OpenAI value no longer leaks through. The project rule is already that the registry is the only source of truth for what the gateway can actually serve (see context-window.ts header comment); the new behaviour enforces it consistently.
The codex-auto-review alias entry was being appended verbatim from the
bundled catalog ({...aliasEntry}). The bundled service_tiers reflect
OpenAI 1p's tiering, which Floway may not be able to bill — the same
reason the bundled-hit path rewrites service_tiers from the target's
registry cost.tiers (a tier we can bill must have unit prices in the
registry).
Track the alias target's InternalModel through the chat-model loop and
feed it to deriveServiceTiers when appending the alias. Two new tests
pin (a) tiers projected from the target's cost.tiers and (b) [] when
the target has no tiers.
The previous commit moved the 128k context_window fallback into applyContextWindowFromRegistry, which made the default apply uniformly to bundled-hit slugs too — overwriting the bundled-vendored window whenever the registry happened not to advertise one. Bundled-hit entries have a real upstream-vendored window from the codex catalog; that value is a fine fallback when the registry has none, and silently rewriting it to 128k regresses operators who had not yet configured max_context_window_tokens. Keep applyContextWindowFromRegistry's original 'pass through unchanged when resolver returns null' behaviour, and instead apply the 128k default inside synthesizeCatalogEntry — synth entries are the only ones that would otherwise emit no context_window field at all. Move the CONSERVATIVE_DEFAULT_CONTEXT_WINDOW constant alongside its sole consumer.
The bundle-match path leans on `MODEL_PREFIX_REGEX` requiring a trailing slash on every prefix — without that, the segment splitter would not be guaranteed to land on the unprefixed leaf. Lock the invariant with a deep-nested prefix case so a future prefix-regex relaxation surfaces here.
display_name is the only bundled-inherited field we keep on undefined — service_tiers / context_window have registry-driven fallbacks (they tie to billing / runtime gating), but the UI label is fine to inherit from bundled. Lock the asymmetry so a future refactor that aligns the three fields' fallback policies hits this test first.
The Workers Cache API wrapper around codex /models was a latency hedge: codex aborts its catalog refresh after 5 s, the registry leg can cost ~4 s, and the cache let the slow path run at most once per (colo, client_version, upstream filter) per 5-min window. The downside is a 5-min staleness window: operator changes to an upstream's catalog do not surface in the codex picker until the cache expires, and there is no easy way to invalidate from the dashboard. That tradeoff is wrong for an interactive operator surface — fresh data matters more than the spare worst-case second. Strip the cache wrapper, drop CACHE_TTL_SECONDS / cacheKeyFor / parseCodexVersion import, drop the cache-control: max-age header on the response (a stale codex picker is harder to debug than one extra hit per CLI launch). Provider-side model lists keep their own SWR caching in models-cache.ts — that layer is intentional and unchanged. The existing cache test is repurposed to a regression guard: it stubs caches.default with match/put spies and asserts both stay at zero, so a future reintroduction of caching at this layer fails loudly.
After the cache wrapper was removed, the file header still carried a
paragraph framing the absence ("no per-colo cache", codex's 5s timeout)
and the test suite still pinned 'caches.default is untouched' as a
guard. Both only make sense if a reader expects there to be a cache —
i.e. they preserve the shape of a feature that no longer exists. Drop
them so the code reads as if the catalog endpoint had always been a
direct registry passthrough.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites Floway's
/codex/modelsendpoint (the catalog Codex CLI consumes for its/modelpicker) to surface every chat model the gateway can route, not just the slugs that overlap with the upstream bundled Codex catalog. Models whose public id segment-matches a bundled slug reuse the bundled entry verbatim with slug/display_name/context_window overridden; the rest get synthesized entries built from the registry's per-model metadata (chat?:first-class field)./codex/models rewrite
computeCatalognow iterates the registry. For each chat-kind model:/and:; first segment matching a bundled slug → reuse bundled entry, overrideslug,display_name, andcontext_windowsynthesizeCatalogEntryusing the registry'schat?.modalities/chat?.reasoning/limits/cost.tierscodex-auto-reviewalias appears only when the literalgpt-5.4is in the registry (matches existing data-plane resolver semantics)Worker-side cache continues at 5 min per (client_version, upstream filter); a cold deploy plus an active session pays the slow path at most once per colo.
Synthesized entries
base_instructionsvendored fromopenai/codexmodels.json(gpt-5.5 entry) with one line edited for provider neutrality:"based on GPT-5"→"running in the Codex CLI". Apache-2.0 attribution + commit-pinned permalink.service_tiersderived from each model'scost.tierskeys (so e.g.claude-opus-4-8'stiers.fastpricing surface translates into a/fasttoggle in the picker).supported_reasoning_levels+default_reasoning_levelprojected fromchat.reasoning.effortwhen present (Codex CLI's catalog wire is effort-only; budget_tokens / adaptive / mandatory sub-modes silently dropped at the wire boundary).input_modalitiesderived fromchat.modalities.input; downstreamweb_search_tool_typeandsupports_image_detail_originalderived in turn.shell_type,prefer_websockets,apply_patch_tool_type, etc.).Bundled-reuse details
Bundled entries pass through their original
priority,description,support_verbosity, etc. Only three fields are overridden from the registry:slug→ public model id (so the CLI sends the right slug back)display_name→ registry's display_name (operator-controlled label)context_window/max_context_window→ registry'slimits.max_context_window_tokensservice_tiers→ derived from registrycost.tiers(replaces bundled; bundled today emits[]so usually a no-op)Operator
chat?:overrides on a registry model do NOT apply to matched-bundled slugs (deliberate — bundled is the upstream catalog's authoritative shape).Depends on
This PR depends on per-model
chat?:metadata (PR #115). Base is set tofloway/chat-metadatauntil that merges; will rebase tomainafter.Limitations (deliberate)
ollama/gpt-5.5(hypothetical) inherits chatgpt.comgpt-5.5metadata. Operator override silently ignored for matched slugs.Test plan
pnpm run test— 3478 tests passingpnpm run lint— cleanpnpm run typecheck— clean across gateway / platform-cloudflare / platform-node / web/codex/modelssynthesis (deepseek-v4-pro), bundled match (openrouter/gpt-5.5:nitro), service_tiers derivation/modelpicker against this deploy — every chat model visible; reasoning effort selector foreffort-bearing models;/fasttoggle forclaude-opus-4-{6,7,8}(requires PR feat(translate): bridge service_tier:fast ↔ speed:fast across OpenAI/Anthropic translators #114's translator)