Support multimodal API routing and metered billing#16
Conversation
📝 WalkthroughWalkthroughAdds model-surface validation and request-model extraction/rewrites; expands API types and routing; introduces MeteredUsage and metered-unit billing; enhances Responses SSE parsing and usage extraction; updates OpenAI/Google provider parsing/enrichment/resolution; and adds tests for streaming, normalization, extraction, and billing. ChangesPrimary DAG
Note: The hidden review stack contains all changed ranges for a guided, ordered review. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
providers/openai_compatible/resolver.go (1)
16-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSupport string-form
api_typeduring resolver dispatchAt Line 18,
api_typeis only read asllmproxy.APIType. When metadata carries"api_type"as a string, this falls through to Line 21 and incorrectly routes to/v1/chat/completions.Suggested fix
func (r *Resolver) Resolve(meta llmproxy.BodyMetadata) (*url.URL, error) { apiType := r.APIType if apiType == "" { - if v, ok := meta.Custom["api_type"].(llmproxy.APIType); ok { - apiType = v - } else { - apiType = llmproxy.APITypeChatCompletions - } + switch v := meta.Custom["api_type"].(type) { + case llmproxy.APIType: + apiType = v + case string: + apiType = llmproxy.APIType(v) + default: + apiType = llmproxy.APITypeChatCompletions + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@providers/openai_compatible/resolver.go` around lines 16 - 23, The resolver currently only accepts meta.Custom["api_type"] as llmproxy.APIType and falls back to llmproxy.APITypeChatCompletions when the metadata contains a string; update the dispatch in resolver.go to also handle string-form API types by checking for meta.Custom["api_type"].(string) and converting or mapping that string to the appropriate llmproxy.APIType (e.g., produce llmproxy.APIType(r) or map "chat.completions"/"chat" etc. to llmproxy.APITypeChatCompletions) before assigning apiType so the variable apiType (initialized from r.APIType) correctly reflects string values from metadata instead of always falling back to llmproxy.APITypeChatCompletions.
🧹 Nitpick comments (3)
interceptors/billing.go (1)
66-92: ⚡ Quick winCode duplication with
billing_calculator.go.The helper functions
mergeMeteredUsage,firstNonZeroInt, andfirstNonZeroFloatare duplicated in bothinterceptors/billing.goandbilling_calculator.go. Consider extracting these to a shared location to avoid maintenance burden.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@interceptors/billing.go` around lines 66 - 92, The functions mergeMeteredUsage, firstNonZeroInt, and firstNonZeroFloat are duplicated; extract them into a single shared utility (e.g., a new package or common file) and have both interceptors/billing.go and billing_calculator.go import and call that shared implementation. Move the implementations for mergeMeteredUsage, firstNonZeroInt, and firstNonZeroFloat into the new shared file, update the callers in both files to reference the shared symbols, and remove the duplicate declarations from interceptors/billing.go and billing_calculator.go so only the shared version remains.autorouter_test.go (1)
1824-1824: 💤 Low valueUse
httptest.NewRequestWithContextfor consistency.Static analysis flags these three test requests as using
httptest.NewRequestinstead ofhttptest.NewRequestWithContext. While this is less critical in test code, it's good practice to be consistent with the rest of the test file which usesNewRequestWithContext.♻️ Suggested fix
- req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", strings.NewReader(`{"model":"googleai/veo-3.1-generate-preview","messages":[{"role":"user","content":"hello"}]}`)) + req := httptest.NewRequestWithContext(context.Background(), http.MethodPost, "/v1/chat/completions", strings.NewReader(`{"model":"googleai/veo-3.1-generate-preview","messages":[{"role":"user","content":"hello"}]}`))Apply similar changes to lines 1880 and 1929.
Also applies to: 1880-1880, 1929-1929
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@autorouter_test.go` at line 1824, Replace uses of httptest.NewRequest with httptest.NewRequestWithContext and pass a context (e.g., context.Background()) so the test requests are created consistently; update the req initialization (the variable using httptest.NewRequest for the POST to "/v1/chat/completions") and the two other similar occurrences to call httptest.NewRequestWithContext(context.Background(), http.MethodPost, "/v1/chat/completions", strings.NewReader(...)); add an import for context if missing.autorouter.go (1)
1020-1046: ⚡ Quick winRole mapping in Google AI normalization only handles "user" and "assistant".
The
normalizeGoogleAIRequestfunction maps"assistant"to"model"and everything else to"user". This may incorrectly convert"system"messages to"user"role instead of handling them as system instructions or filtering them out.Consider whether system messages in the
messagesarray should be handled separately (potentially merged intosystemInstruction) rather than converted to user messages.♻️ Suggested improvement
role, _ := message["role"].(string) - if role == "assistant" { + switch role { + case "system": + // Skip system messages - they should use systemInstruction + continue + case "assistant": role = "model" - } else { + default: role = "user" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@autorouter.go` around lines 1020 - 1046, normalizeGoogleAIRequest currently maps "assistant"->"model" and all other roles to "user", which will misclassify "system" messages; update the loop that processes raw["messages"] to explicitly detect role == "system" and either merge its content into raw["systemInstruction"] (appending or creating it) or skip/filter system entries, keep "assistant" -> "model" and "user" -> "user" mapping for others, and preserve variable names used in the diff (raw, messages, contents, systemInstruction) so the change is localized and easy to review.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@providers/googleai/resolver.go`:
- Around line 33-37: The resolver currently only reads meta.Custom["api_type"]
as llmproxy.APIType (apiType) which misses cases where the value is a string;
update the logic that sets apiType (used in the switch that checks
llmproxy.APITypePredictLongRunning and llmproxy.APITypeStreamGenerateContent) to
accept both a llmproxy.APIType and a string representation (e.g., cast string ->
llmproxy.APIType or map the string to the enum) before the endpoint switch;
ensure meta.Stream is still respected and that r.BaseURL.JoinPath("v1beta",
"models", fmt.Sprintf("%s:predictLongRunning", model)) and the stream branch are
chosen when either the enum or the equivalent string is provided, with a
sensible default fallback if neither is present.
---
Outside diff comments:
In `@providers/openai_compatible/resolver.go`:
- Around line 16-23: The resolver currently only accepts meta.Custom["api_type"]
as llmproxy.APIType and falls back to llmproxy.APITypeChatCompletions when the
metadata contains a string; update the dispatch in resolver.go to also handle
string-form API types by checking for meta.Custom["api_type"].(string) and
converting or mapping that string to the appropriate llmproxy.APIType (e.g.,
produce llmproxy.APIType(r) or map "chat.completions"/"chat" etc. to
llmproxy.APITypeChatCompletions) before assigning apiType so the variable
apiType (initialized from r.APIType) correctly reflects string values from
metadata instead of always falling back to llmproxy.APITypeChatCompletions.
---
Nitpick comments:
In `@autorouter_test.go`:
- Line 1824: Replace uses of httptest.NewRequest with
httptest.NewRequestWithContext and pass a context (e.g., context.Background())
so the test requests are created consistently; update the req initialization
(the variable using httptest.NewRequest for the POST to "/v1/chat/completions")
and the two other similar occurrences to call
httptest.NewRequestWithContext(context.Background(), http.MethodPost,
"/v1/chat/completions", strings.NewReader(...)); add an import for context if
missing.
In `@autorouter.go`:
- Around line 1020-1046: normalizeGoogleAIRequest currently maps
"assistant"->"model" and all other roles to "user", which will misclassify
"system" messages; update the loop that processes raw["messages"] to explicitly
detect role == "system" and either merge its content into
raw["systemInstruction"] (appending or creating it) or skip/filter system
entries, keep "assistant" -> "model" and "user" -> "user" mapping for others,
and preserve variable names used in the diff (raw, messages, contents,
systemInstruction) so the change is localized and easy to review.
In `@interceptors/billing.go`:
- Around line 66-92: The functions mergeMeteredUsage, firstNonZeroInt, and
firstNonZeroFloat are duplicated; extract them into a single shared utility
(e.g., a new package or common file) and have both interceptors/billing.go and
billing_calculator.go import and call that shared implementation. Move the
implementations for mergeMeteredUsage, firstNonZeroInt, and firstNonZeroFloat
into the new shared file, update the callers in both files to reference the
shared symbols, and remove the duplicate declarations from
interceptors/billing.go and billing_calculator.go so only the shared version
remains.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9d5577ca-20ec-465a-bfe3-b768295000fb
📒 Files selected for processing (21)
apitype.goautorouter.goautorouter_test.gobilling.gobilling_calculator.gobilling_test.gointerceptors/billing.gointerceptors/coverage_test.gointerceptors/retry.gometadata.gomodel_metadata.gopricing/modelsdev/adapter.goproviders/googleai/resolver.goproviders/openai_compatible/enricher.goproviders/openai_compatible/extractor.goproviders/openai_compatible/extractor_test.goproviders/openai_compatible/parser.goproviders/openai_compatible/parser_test.goproviders/openai_compatible/resolver.goproviders/openai_compatible/responses_parser.goproviders/openai_compatible/responses_test.go
✅ Files skipped from review due to trivial changes (3)
- pricing/modelsdev/adapter.go
- providers/openai_compatible/parser_test.go
- providers/openai_compatible/enricher.go
📜 Review details
🧰 Additional context used
🪛 golangci-lint (2.12.1)
autorouter_test.go
[error] 1824-1824: net/http/httptest.NewRequest must not be called. use net/http/httptest.NewRequestWithContext
(noctx)
[error] 1880-1880: net/http/httptest.NewRequest must not be called. use net/http/httptest.NewRequestWithContext
(noctx)
[error] 1929-1929: net/http/httptest.NewRequest must not be called. use net/http/httptest.NewRequestWithContext
(noctx)
🔇 Additional comments (10)
interceptors/retry.go (1)
56-59: Good fix: final attempt response body is preserved for callers.Conditioning the drain/close on non-final attempts is correct and prevents returning a closed body on exhausted retries.
interceptors/coverage_test.go (1)
246-282: Coverage addition is on point for the retry-body regression.This test validates the exact contract change: after retries are exhausted, the final response body remains readable.
providers/openai_compatible/responses_test.go (1)
1765-1804: Good regression coverage for string-formapi_typedispatchThis test closes an important routing gap and validates usage extraction remains intact in the string-dispatch path.
providers/openai_compatible/extractor.go (2)
34-41: LGTM on non-JSON response handling.The logic correctly short-circuits JSON parsing for non-JSON content types (like
audio/mpegfor speech synthesis responses), returning the raw body unchanged. This allows binary responses to pass through without triggering parse errors.
134-160: LGTM on UsageInfo helper methods.The fallback logic in
PromptTokenCount(),CompletionTokenCount(), andTotalTokenCount()correctly prefers standard OpenAI fields (prompt_tokens,completion_tokens,total_tokens) before falling back to alternate field names (input_tokens,output_tokens). TheInputAudioSeconds()method correctly guards ontype == "duration".billing.go (1)
62-83: LGTM on metered unit billing logic.The branching on
costInfo.Unitcorrectly separates token-based billing (default path) from metered-unit billing (characters, audio). The helper functionsbillingQuantitiesForUnitandmeteredCostForUnitproperly handle the unit conversion:
- Characters: cost per million characters
- Audio: cost per minute (input seconds / 60)
providers/openai_compatible/extractor_test.go (1)
56-141: LGTM on new extractor tests.The tests comprehensively cover the new metered usage extraction:
- Image generation usage with
input_tokens/output_tokensfield names andGeneratedImagesfromdataarray length- Duration-based audio usage with
InputAudioSeconds- Non-JSON response pass-through for binary content types
autorouter.go (1)
455-459: Verify gateway metadata is emitted for all SSE streams.The condition
if sseWriter != nilensuresgateway.metadatais written whenever we're processing an SSE stream with billing enabled, not just when a[DONE]marker is detected. This aligns with the PR objective to emit metadata for streams without terminal markers.billing_calculator.go (1)
45-46: LGTM on metered usage integration.The calculator correctly merges request and response metered usage before passing to
CalculateCostWithMeteredUsage, enabling non-token billing for audio transcription, TTS, and image generation.billing_test.go (1)
186-220: LGTM on metered billing unit tests.The tests correctly validate:
- Per-million-characters: 10 characters at $15/million = $0.00015
- Per-minute-audio: 30 seconds (0.5 minutes) at $0.006/minute = $0.003
The
InputQuantityassertions confirm the derived quantities are correctly populated.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@interceptors/billing.go`:
- Around line 66-74: mergeMeteredUsage currently treats zero as "missing" by
using firstNonZeroInt/Float, which lets a response-reported zero be ignored in
favor of a non-zero request estimate; change the presence semantics instead:
update llmproxy.MeteredUsage to carry presence (e.g., make numeric fields
pointers or add explicit hasX booleans), then modify mergeMeteredUsage to prefer
response values when the response-side field is present (even if zero) and only
fall back to request values when the response-side is absent; update or remove
firstNonZeroInt/firstNonZeroFloat and all call sites (including the similar
merge block later) to use the new presence-aware logic so real zero usage from
responses does not get overwritten by request estimates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f6eb9508-cb9d-4378-87a7-2da70e20b4d7
📒 Files selected for processing (8)
billing.gobilling_calculator.gobilling_test.gointerceptors/billing.gometadata.goproviders/googleai/parser.goproviders/googleai/parser_test.goproviders/googleai/resolver.go
🚧 Files skipped from review as they are similar to previous changes (3)
- metadata.go
- billing_calculator.go
- billing.go
📜 Review details
🔇 Additional comments (4)
providers/googleai/resolver.go (1)
33-57: Goodapi_typenormalization and endpoint routingThis correctly handles both enum and string
api_typevalues and cleanly routes long-running vs streaming vs default endpoints.billing_test.go (1)
186-238: Strong unit coverage for new metered billing pathsThese tests validate both quantity derivation and final cost for all newly introduced non-token billing units.
providers/googleai/parser.go (1)
48-51: Video metered-usage parsing integration looks solid
OutputVideoSecondsextraction and known-field handling forinstances/parametersare consistent and correctly preventparametersleakage intometa.Custom.Also applies to: 96-101, 150-159, 180-181
providers/googleai/parser_test.go (1)
98-112: Nice regression coverage for video usage and stringapi_typeroutingThe new cases exercise both metered video quantity extraction and endpoint resolution for string-based API types.
Also applies to: 183-221
| func mergeMeteredUsage(requestUsage llmproxy.MeteredUsage, responseUsage llmproxy.MeteredUsage) llmproxy.MeteredUsage { | ||
| return llmproxy.MeteredUsage{ | ||
| InputCharacters: firstNonZeroInt(responseUsage.InputCharacters, requestUsage.InputCharacters), | ||
| OutputCharacters: firstNonZeroInt(responseUsage.OutputCharacters, requestUsage.OutputCharacters), | ||
| InputAudioSeconds: firstNonZeroFloat(responseUsage.InputAudioSeconds, requestUsage.InputAudioSeconds), | ||
| OutputAudioSeconds: firstNonZeroFloat(responseUsage.OutputAudioSeconds, requestUsage.OutputAudioSeconds), | ||
| OutputVideoSeconds: firstNonZeroFloat(responseUsage.OutputVideoSeconds, requestUsage.OutputVideoSeconds), | ||
| GeneratedImages: firstNonZeroInt(responseUsage.GeneratedImages, requestUsage.GeneratedImages), | ||
| } |
There was a problem hiding this comment.
Response-reported zero usage cannot override request estimates
At Line 68–74, firstNonZero* treats 0 as “missing.” If the response contains an actual measured 0 (valid outcome) and the request had a non-zero estimate, merged usage keeps the request value and can overbill. This needs presence-aware merge semantics (e.g., explicit “set” flags/pointers) rather than value-based zero checks.
Also applies to: 77-93
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@interceptors/billing.go` around lines 66 - 74, mergeMeteredUsage currently
treats zero as "missing" by using firstNonZeroInt/Float, which lets a
response-reported zero be ignored in favor of a non-zero request estimate;
change the presence semantics instead: update llmproxy.MeteredUsage to carry
presence (e.g., make numeric fields pointers or add explicit hasX booleans),
then modify mergeMeteredUsage to prefer response values when the response-side
field is present (even if zero) and only fall back to request values when the
response-side is absent; update or remove firstNonZeroInt/firstNonZeroFloat and
all call sites (including the similar merge block later) to use the new
presence-aware logic so real zero usage from responses does not get overwritten
by request estimates.
Summary
Validation
Summary by CodeRabbit
New Features
Bug Fixes
Tests