Skip to content

fix(instrumentations): include cache tokens in gen ai.usage.input tokens#1027

Open
dvirski wants to merge 6 commits into
mainfrom
dr/fix(instrumentation)-include-cache-tokens-in-gen_ai.usage.input_tokens
Open

fix(instrumentations): include cache tokens in gen ai.usage.input tokens#1027
dvirski wants to merge 6 commits into
mainfrom
dr/fix(instrumentation)-include-cache-tokens-in-gen_ai.usage.input_tokens

Conversation

@dvirski

@dvirski dvirski commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Per the OpenTelemetry GenAI semantic conventions, cache_read.input_tokens
and cache_creation.input_tokens should be a subset of input_tokens — i.e.,
input_tokens is the total billed input volume, with cache attributes
providing per-line-item detail. The JS instrumentations were diverging
from the spec and from the Python SDK, causing downstream aggregators to
under-count Anthropic input across SDK-JS traffic.

Changes:

  • instrumentation-anthropic: fold cache_read_input_tokens and
    cache_creation_input_tokens into gen_ai.usage.input_tokens and
    gen_ai.usage.total_tokens. Streaming inherits via shared _endSpan path.
  • instrumentation-bedrock: same fold-in for the Anthropic-on-Bedrock
    response-body handler.
  • instrumentation-langchain: read langchain-core's canonical usage_metadata
    (input_token_details.cache_read / cache_creation) from generation
    messages and emit them as separate attributes. langchain-core's contract
    documents input_tokens as already-summed, so no fold-in is needed there.
    Legacy llmOutput.usage / tokenUsage paths preserved as fallback.

Cache attributes continue to be emitted separately for line-item visibility;
only the value written to input_tokens changes.

Summary by CodeRabbit

  • New Features
    • Added cached-token usage attributes to GenAI spans across Anthropic, Bedrock, OpenAI, Together, LangChain, and VertexAI.
    • Total and input token metrics now incorporate cache read/creation semantics where supported (including emitting cache-specific token attributes).
  • Bug Fixes
    • Improved LangChain token accounting by prioritizing usage metadata and preventing double-counting.
  • Tests
    • Added coverage for cache token “fold-in” behavior and new per-provider cache-read attribute assertions.
  • Chores
    • Updated semantic-conventions dependency versions in Together and VertexAI packages.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds cache token fold-in semantics across Anthropic, Bedrock, and LangChain instrumentations (folding cache_read_input_tokens and cache_creation_input_tokens into gen_ai.usage.input_tokens and gen_ai.usage.total_tokens), and adds ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS span attribute recording to OpenAI Responses API, Together, and VertexAI instrumentations. Bumps @opentelemetry/semantic-conventions to ^1.40.0 in Together and VertexAI packages.

Changes

Cache Token Fold-In Across Instrumentation Packages

Layer / File(s) Summary
Anthropic cache token fold-in implementation and tests
packages/instrumentation-anthropic/src/instrumentation.ts, packages/instrumentation-anthropic/test/instrumentation.test.ts
_endSpan derives totalInputTokens as input_tokens + cache_read_input_tokens + cache_creation_input_tokens (defaulting to 0), updating gen_ai.usage.input_tokens and gen_ai.usage.total_tokens. Tests directly invoke _endSpan with synthetic usage data to verify fold-in across three scenarios.
Bedrock Anthropic cache token fold-in implementation and tests
packages/instrumentation-bedrock/src/instrumentation.ts, packages/instrumentation-bedrock/tests/anthropic.test.ts, packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
_setResponseAttributes for Anthropic non-streaming responses computes folded input tokens and conditionally emits cache-read/creation attributes. Integration test mocks a Bedrock Anthropic invocation with cache fields. Unit test file directly exercises _setResponseAttributes for all three fold-in scenarios.
LangChain usage_metadata cache token extraction and tests
packages/instrumentation-langchain/src/callback_handler.ts, packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
handleLLMEnd now prefers usage_metadata from generation messages via extractUsageMetadataFromGenerations, setting input/output/total tokens and cache read/creation attributes from input_token_details. The tokenUsage fallback is guarded to skip when usage_metadata was already used. Tests cover four scenarios: both cache fields, read-only, no details, and fallback.
OpenAI Responses API cache read token recording and test
packages/instrumentation-openai/src/instrumentation.ts, packages/instrumentation-openai/test/instrumentation.test.ts, packages/instrumentation-openai/test/recordings/...
ResponsesResult extended with usage.input_tokens_details.cached_tokens. _endResponsesSpan sets ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS when cached_tokens is present. New test validates the attribute on the span with HAR recording.
Together and VertexAI cache read token recording, dependency bumps, and tests
packages/instrumentation-together/..., packages/instrumentation-vertexai/...
Together _endSpan reads cached_tokens from result.usage. VertexAI _endSpan reads cachedContentTokenCount from usageMetadata. Both set ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS when present. Both packages bump @opentelemetry/semantic-conventions to ^1.40.0. Tests validate the new span attribute in each.

Sequence Diagram(s)

sequenceDiagram
  participant AIResponse as AI Provider Response
  participant Instrumentation as Instrumentation Layer
  participant Span as OpenTelemetry Span
  
  AIResponse->>Instrumentation: response with input_tokens, cache_read_input_tokens, cache_creation_input_tokens
  Instrumentation->>Instrumentation: Compute totalInputTokens = input_tokens + cache_read_input_tokens + cache_creation_input_tokens
  Instrumentation->>Span: Set gen_ai.usage.input_tokens = totalInputTokens
  Instrumentation->>Span: Set gen_ai.usage.total_tokens = totalInputTokens + output_tokens
  Instrumentation->>Span: Set gen_ai.usage.output_tokens = output_tokens
  alt cache_read_input_tokens present
    Instrumentation->>Span: Set gen_ai.usage.cache.read.input_tokens = cache_read_input_tokens
  end
  alt cache_creation_input_tokens present
    Instrumentation->>Span: Set gen_ai.usage.cache.creation.input_tokens = cache_creation_input_tokens
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • traceloop/openllmetry-js#839: Both PRs update GenAI cached-token usage reporting—main PR adjusts per-vendor span token accounting to fold cache_read_input_tokens/cache_creation_input_tokens into gen_ai.usage.input_tokens/total_tokens, while retrieved PR #839 adds/normalizes the same cached-token attributes via shared semantic conventions and traceloop-sdk providerMetadata transformation.
  • traceloop/openllmetry-js#1010: Both PRs modify packages/instrumentation-openai/src/instrumentation.ts's Responses span finalization (_endResponsesSpan) to set token-usage attributes on the span, so the changes overlap at the same code path.

Suggested reviewers

  • doronkopit5

Poem

🐇 Hop hop, the cache tokens flow,
Folded in so the totals grow!
Six packages patched with care so fine,
Each span now tracks what's cached in line.
No double-counts, just math precise —
The rabbit checked the sums twice! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: including cache tokens in gen_ai.usage.input_tokens across the instrumentation packages, which aligns with the PR's core objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dr/fix(instrumentation)-include-cache-tokens-in-gen_ai.usage.input_tokens

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed: dependency version conflict. Check your lock file or package.json.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/instrumentation-together/src/instrumentation.ts`:
- Around line 525-533: The cachedTokens variable is cast to number without type
validation, which could cause issues if the Together API returns a non-numeric
value. Add a type guard to verify that cachedTokens is actually a number before
calling span.setAttribute with ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS. Check
if typeof cachedTokens === 'number' in addition to the existing truthiness
check, and only set the attribute if the value passes both validations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1bc0adcd-18ed-4dc1-bbb4-6a1a91594454

📥 Commits

Reviewing files that changed from the base of the PR and between 28c4a7a and a6da699.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (15)
  • packages/instrumentation-anthropic/src/instrumentation.ts
  • packages/instrumentation-anthropic/test/instrumentation.test.ts
  • packages/instrumentation-bedrock/src/instrumentation.ts
  • packages/instrumentation-bedrock/tests/anthropic.test.ts
  • packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
  • packages/instrumentation-langchain/src/callback_handler.ts
  • packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
  • packages/instrumentation-openai/src/instrumentation.ts
  • packages/instrumentation-openai/test/instrumentation.test.ts
  • packages/instrumentation-together/package.json
  • packages/instrumentation-together/src/instrumentation.ts
  • packages/instrumentation-together/test/instrumentation.test.ts
  • packages/instrumentation-vertexai/package.json
  • packages/instrumentation-vertexai/src/vertexai-instrumentation.ts
  • packages/instrumentation-vertexai/tests/gemini.test.ts

Comment on lines +525 to +533
const cachedTokens = (
result.usage as unknown as Record<string, unknown>
).cached_tokens;
if (cachedTokens) {
span.setAttribute(
ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
cachedTokens as number,
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate type before casting to number.

The code casts cachedTokens to number without verifying it's actually a number. If the Together API returns a non-numeric value, this could cause downstream issues.

🛡️ Proposed fix to add type guard
 const cachedTokens = (
   result.usage as unknown as Record<string, unknown>
 ).cached_tokens;
-if (cachedTokens) {
+if (typeof cachedTokens === "number") {
   span.setAttribute(
     ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
-    cachedTokens as number,
+    cachedTokens,
   );
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/instrumentation-together/src/instrumentation.ts` around lines 525 -
533, The cachedTokens variable is cast to number without type validation, which
could cause issues if the Together API returns a non-numeric value. Add a type
guard to verify that cachedTokens is actually a number before calling
span.setAttribute with ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS. Check if
typeof cachedTokens === 'number' in addition to the existing truthiness check,
and only set the attribute if the value passes both validations.

@max-deygin-traceloop max-deygin-traceloop left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright, please look if cod rabbit's comment makes sense
And fix lint

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/instrumentation-langchain/src/callback_handler.ts (1)

356-373: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard tokenUsage fallback against llmOutput.usage to prevent attribute overwrite.

The !usageMetadata guard at line 356 correctly prevents tokenUsage from running when usage_metadata exists. However, it doesn't check whether llmOutput.usage (lines 334-352) was already used. When usageMetadata is absent and both llmOutput.usage and llmOutput.tokenUsage exist, both blocks will execute and lines 359, 362, and 367 will overwrite the attributes set by lines 337, 340, and 347-350.

In practice, llmOutput.usage and llmOutput.tokenUsage likely represent mutually exclusive provider formats and may never coexist, but the code should be defensive.

Add llmOutput.usage check to the tokenUsage guard
     // Also check for tokenUsage format (for compatibility).
     // Skip when usage_metadata already populated the values.
-    if (!usageMetadata && output.llmOutput?.tokenUsage) {
+    if (!usageMetadata && !output.llmOutput?.usage && output.llmOutput?.tokenUsage) {
       const usage = output.llmOutput.tokenUsage;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/instrumentation-langchain/src/callback_handler.ts` around lines 356
- 373, The guard condition for the tokenUsage fallback block checking
`!usageMetadata && output.llmOutput?.tokenUsage` does not account for whether
llmOutput.usage was already processed. If both llmOutput.usage and
llmOutput.tokenUsage exist, the tokenUsage block will execute and overwrite the
attributes (ATTR_GEN_AI_USAGE_INPUT_TOKENS, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, and
GEN_AI_USAGE_TOTAL_TOKENS) that were already set by the prior llmOutput.usage
block. Add an additional guard check for `!output.llmOutput.usage` to the if
condition so the tokenUsage fallback only runs when neither usageMetadata nor
llmOutput.usage have been processed.
🧹 Nitpick comments (1)
packages/instrumentation-openai/src/instrumentation.ts (1)

972-978: ⚡ Quick win

Use explicit type guard for consistency with surrounding code.

Lines 953-958 use typeof inputTokens === "number" and typeof outputTokens === "number" to guard attribute setting, ensuring attributes are set even when values are 0. Line 973 uses if (cachedTokens) which is falsy for 0, meaning the attribute won't be set when zero tokens are cached.

While omitting the attribute when there are zero cached tokens may be semantically acceptable ("cache not used"), it's inconsistent with how the function handles other token counts and could cause downstream aggregators to treat "absent" differently from "zero."

Align with the existing type-guard pattern
-        const cachedTokens = result.usage.input_tokens_details?.cached_tokens;
-        if (cachedTokens) {
+        const cachedTokens = result.usage.input_tokens_details?.cached_tokens;
+        if (typeof cachedTokens === "number") {
           span.setAttribute(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/instrumentation-openai/src/instrumentation.ts` around lines 972 -
978, The code at line 973 uses a falsy check `if (cachedTokens)` to guard the
ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute setting, which is
inconsistent with the pattern used earlier in the function for inputTokens and
outputTokens (lines 953-958) that use explicit type guards like `typeof
inputTokens === "number"`. Replace the `if (cachedTokens)` condition with
`typeof cachedTokens === "number"` to match the existing type-guard pattern and
ensure the attribute is properly set even when the cached tokens value is 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/instrumentation-langchain/src/callback_handler.ts`:
- Around line 356-373: The guard condition for the tokenUsage fallback block
checking `!usageMetadata && output.llmOutput?.tokenUsage` does not account for
whether llmOutput.usage was already processed. If both llmOutput.usage and
llmOutput.tokenUsage exist, the tokenUsage block will execute and overwrite the
attributes (ATTR_GEN_AI_USAGE_INPUT_TOKENS, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, and
GEN_AI_USAGE_TOTAL_TOKENS) that were already set by the prior llmOutput.usage
block. Add an additional guard check for `!output.llmOutput.usage` to the if
condition so the tokenUsage fallback only runs when neither usageMetadata nor
llmOutput.usage have been processed.

---

Nitpick comments:
In `@packages/instrumentation-openai/src/instrumentation.ts`:
- Around line 972-978: The code at line 973 uses a falsy check `if
(cachedTokens)` to guard the ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute
setting, which is inconsistent with the pattern used earlier in the function for
inputTokens and outputTokens (lines 953-958) that use explicit type guards like
`typeof inputTokens === "number"`. Replace the `if (cachedTokens)` condition
with `typeof cachedTokens === "number"` to match the existing type-guard pattern
and ensure the attribute is properly set even when the cached tokens value is 0.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 574594ea-d228-45dc-b03c-7e859e58e68a

📥 Commits

Reviewing files that changed from the base of the PR and between a6da699 and 29c0f99.

📒 Files selected for processing (8)
  • packages/instrumentation-bedrock/tests/anthropic.test.ts
  • packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
  • packages/instrumentation-langchain/src/callback_handler.ts
  • packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
  • packages/instrumentation-openai/src/instrumentation.ts
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-cache-read-input-tokens-in-span-for-responses-with-cached-tokens_3930499540/recording.har
  • packages/instrumentation-together/test/instrumentation.test.ts
  • packages/instrumentation-vertexai/tests/gemini.test.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-cache-read-input-tokens-in-span-for-responses-with-cached-tokens_3930499540/recording.har
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/instrumentation-together/test/instrumentation.test.ts
  • packages/instrumentation-bedrock/tests/anthropic.test.ts
  • packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
  • packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
  • packages/instrumentation-vertexai/tests/gemini.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants