fix(instrumentations): include cache tokens in gen ai.usage.input tokens by dvirski · Pull Request #1027 · traceloop/openllmetry-js

dvirski · 2026-06-16T14:05:22Z

Per the OpenTelemetry GenAI semantic conventions, cache_read.input_tokens
and cache_creation.input_tokens should be a subset of input_tokens — i.e.,
input_tokens is the total billed input volume, with cache attributes
providing per-line-item detail. The JS instrumentations were diverging
from the spec and from the Python SDK, causing downstream aggregators to
under-count Anthropic input across SDK-JS traffic.

Changes:

instrumentation-anthropic: fold cache_read_input_tokens and
cache_creation_input_tokens into gen_ai.usage.input_tokens and
gen_ai.usage.total_tokens. Streaming inherits via shared _endSpan path.
instrumentation-bedrock: same fold-in for the Anthropic-on-Bedrock
response-body handler.
instrumentation-langchain: read langchain-core's canonical usage_metadata
(input_token_details.cache_read / cache_creation) from generation
messages and emit them as separate attributes. langchain-core's contract
documents input_tokens as already-summed, so no fold-in is needed there.
Legacy llmOutput.usage / tokenUsage paths preserved as fallback.

Cache attributes continue to be emitted separately for line-item visibility;
only the value written to input_tokens changes.

Summary by CodeRabbit

New Features
- Added cached-token usage attributes to GenAI spans across Anthropic, Bedrock, OpenAI, Together, LangChain, and VertexAI.
- Total and input token metrics now incorporate cache read/creation semantics where supported (including emitting cache-specific token attributes).
Bug Fixes
- Improved LangChain token accounting by prioritizing usage metadata and preventing double-counting.
Tests
- Added coverage for cache token “fold-in” behavior and new per-provider cache-read attribute assertions.
Chores
- Updated semantic-conventions dependency versions in Together and VertexAI packages.

coderabbitai · 2026-06-16T14:05:51Z

📝 Walkthrough

Walkthrough

Adds cache token fold-in semantics across Anthropic, Bedrock, and LangChain instrumentations (folding cache_read_input_tokens and cache_creation_input_tokens into gen_ai.usage.input_tokens and gen_ai.usage.total_tokens), and adds ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS span attribute recording to OpenAI Responses API, Together, and VertexAI instrumentations. Bumps @opentelemetry/semantic-conventions to ^1.40.0 in Together and VertexAI packages.

Changes

Cache Token Fold-In Across Instrumentation Packages

Layer / File(s)	Summary
Anthropic cache token fold-in implementation and tests `packages/instrumentation-anthropic/src/instrumentation.ts`, `packages/instrumentation-anthropic/test/instrumentation.test.ts`	`_endSpan` derives `totalInputTokens` as `input_tokens + cache_read_input_tokens + cache_creation_input_tokens` (defaulting to 0), updating `gen_ai.usage.input_tokens` and `gen_ai.usage.total_tokens`. Tests directly invoke `_endSpan` with synthetic usage data to verify fold-in across three scenarios.
Bedrock Anthropic cache token fold-in implementation and tests `packages/instrumentation-bedrock/src/instrumentation.ts`, `packages/instrumentation-bedrock/tests/anthropic.test.ts`, `packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts`	`_setResponseAttributes` for Anthropic non-streaming responses computes folded input tokens and conditionally emits cache-read/creation attributes. Integration test mocks a Bedrock Anthropic invocation with cache fields. Unit test file directly exercises `_setResponseAttributes` for all three fold-in scenarios.
LangChain usage_metadata cache token extraction and tests `packages/instrumentation-langchain/src/callback_handler.ts`, `packages/instrumentation-langchain/test/cache-token-fold-in.test.ts`	`handleLLMEnd` now prefers `usage_metadata` from generation messages via `extractUsageMetadataFromGenerations`, setting input/output/total tokens and cache read/creation attributes from `input_token_details`. The `tokenUsage` fallback is guarded to skip when `usage_metadata` was already used. Tests cover four scenarios: both cache fields, read-only, no details, and fallback.
OpenAI Responses API cache read token recording and test `packages/instrumentation-openai/src/instrumentation.ts`, `packages/instrumentation-openai/test/instrumentation.test.ts`, `packages/instrumentation-openai/test/recordings/...`	`ResponsesResult` extended with `usage.input_tokens_details.cached_tokens`. `_endResponsesSpan` sets `ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS` when `cached_tokens` is present. New test validates the attribute on the span with HAR recording.
Together and VertexAI cache read token recording, dependency bumps, and tests `packages/instrumentation-together/...`, `packages/instrumentation-vertexai/...`	Together `_endSpan` reads `cached_tokens` from `result.usage`. VertexAI `_endSpan` reads `cachedContentTokenCount` from `usageMetadata`. Both set `ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS` when present. Both packages bump `@opentelemetry/semantic-conventions` to `^1.40.0`. Tests validate the new span attribute in each.

Sequence Diagram(s)

sequenceDiagram
  participant AIResponse as AI Provider Response
  participant Instrumentation as Instrumentation Layer
  participant Span as OpenTelemetry Span
  
  AIResponse->>Instrumentation: response with input_tokens, cache_read_input_tokens, cache_creation_input_tokens
  Instrumentation->>Instrumentation: Compute totalInputTokens = input_tokens + cache_read_input_tokens + cache_creation_input_tokens
  Instrumentation->>Span: Set gen_ai.usage.input_tokens = totalInputTokens
  Instrumentation->>Span: Set gen_ai.usage.total_tokens = totalInputTokens + output_tokens
  Instrumentation->>Span: Set gen_ai.usage.output_tokens = output_tokens
  alt cache_read_input_tokens present
    Instrumentation->>Span: Set gen_ai.usage.cache.read.input_tokens = cache_read_input_tokens
  end
  alt cache_creation_input_tokens present
    Instrumentation->>Span: Set gen_ai.usage.cache.creation.input_tokens = cache_creation_input_tokens
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

traceloop/openllmetry-js#839: Both PRs update GenAI cached-token usage reporting—main PR adjusts per-vendor span token accounting to fold cache_read_input_tokens/cache_creation_input_tokens into gen_ai.usage.input_tokens/total_tokens, while retrieved PR #839 adds/normalizes the same cached-token attributes via shared semantic conventions and traceloop-sdk providerMetadata transformation.
traceloop/openllmetry-js#1010: Both PRs modify packages/instrumentation-openai/src/instrumentation.ts's Responses span finalization (_endResponsesSpan) to set token-usage attributes on the span, so the changes overlap at the same code path.

Suggested reviewers

doronkopit5

Poem

🐇 Hop hop, the cache tokens flow,
Folded in so the totals grow!
Six packages patched with care so fine,
Each span now tracks what's cached in line.
No double-counts, just math precise —
The rabbit checked the sums twice! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: including cache tokens in gen_ai.usage.input_tokens across the instrumentation packages, which aligns with the PR's core objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dr/fix(instrumentation)-include-cache-tokens-in-gen_ai.usage.input_tokens

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed: dependency version conflict. Check your lock file or package.json.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/instrumentation-together/src/instrumentation.ts`:
- Around line 525-533: The cachedTokens variable is cast to number without type
validation, which could cause issues if the Together API returns a non-numeric
value. Add a type guard to verify that cachedTokens is actually a number before
calling span.setAttribute with ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS. Check
if typeof cachedTokens === 'number' in addition to the existing truthiness
check, and only set the attribute if the value passes both validations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1bc0adcd-18ed-4dc1-bbb4-6a1a91594454

📥 Commits

Reviewing files that changed from the base of the PR and between 28c4a7a and a6da699.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (15)

packages/instrumentation-anthropic/src/instrumentation.ts
packages/instrumentation-anthropic/test/instrumentation.test.ts
packages/instrumentation-bedrock/src/instrumentation.ts
packages/instrumentation-bedrock/tests/anthropic.test.ts
packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
packages/instrumentation-langchain/src/callback_handler.ts
packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
packages/instrumentation-openai/src/instrumentation.ts
packages/instrumentation-openai/test/instrumentation.test.ts
packages/instrumentation-together/package.json
packages/instrumentation-together/src/instrumentation.ts
packages/instrumentation-together/test/instrumentation.test.ts
packages/instrumentation-vertexai/package.json
packages/instrumentation-vertexai/src/vertexai-instrumentation.ts
packages/instrumentation-vertexai/tests/gemini.test.ts

coderabbitai · 2026-06-16T14:14:46Z

+        const cachedTokens = (
+          result.usage as unknown as Record<string, unknown>
+        ).cached_tokens;
+        if (cachedTokens) {
+          span.setAttribute(
+            ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
+            cachedTokens as number,
+          );
+        }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate type before casting to number.

The code casts cachedTokens to number without verifying it's actually a number. If the Together API returns a non-numeric value, this could cause downstream issues.

🛡️ Proposed fix to add type guard

const cachedTokens = ( result.usage as unknown as Record<string, unknown> ).cached_tokens; -if (cachedTokens) { +if (typeof cachedTokens === "number") { span.setAttribute( ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, - cachedTokens as number, + cachedTokens, ); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/instrumentation-together/src/instrumentation.ts` around lines 525 - 533, The cachedTokens variable is cast to number without type validation, which could cause issues if the Together API returns a non-numeric value. Add a type guard to verify that cachedTokens is actually a number before calling span.setAttribute with ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS. Check if typeof cachedTokens === 'number' in addition to the existing truthiness check, and only set the attribute if the value passes both validations.

max-deygin-traceloop

Looks alright, please look if cod rabbit's comment makes sense
And fix lint

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/instrumentation-langchain/src/callback_handler.ts (1)
356-373: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard tokenUsage fallback against llmOutput.usage to prevent attribute overwrite.

The !usageMetadata guard at line 356 correctly prevents tokenUsage from running when usage_metadata exists. However, it doesn't check whether llmOutput.usage (lines 334-352) was already used. When usageMetadata is absent and both llmOutput.usage and llmOutput.tokenUsage exist, both blocks will execute and lines 359, 362, and 367 will overwrite the attributes set by lines 337, 340, and 347-350.

In practice, llmOutput.usage and llmOutput.tokenUsage likely represent mutually exclusive provider formats and may never coexist, but the code should be defensive.
Add llmOutput.usage check to the tokenUsage guard
     // Also check for tokenUsage format (for compatibility).
     // Skip when usage_metadata already populated the values.
-    if (!usageMetadata && output.llmOutput?.tokenUsage) {
+    if (!usageMetadata && !output.llmOutput?.usage && output.llmOutput?.tokenUsage) {
       const usage = output.llmOutput.tokenUsage;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/instrumentation-langchain/src/callback_handler.ts` around lines 356
- 373, The guard condition for the tokenUsage fallback block checking
`!usageMetadata && output.llmOutput?.tokenUsage` does not account for whether
llmOutput.usage was already processed. If both llmOutput.usage and
llmOutput.tokenUsage exist, the tokenUsage block will execute and overwrite the
attributes (ATTR_GEN_AI_USAGE_INPUT_TOKENS, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, and
GEN_AI_USAGE_TOTAL_TOKENS) that were already set by the prior llmOutput.usage
block. Add an additional guard check for `!output.llmOutput.usage` to the if
condition so the tokenUsage fallback only runs when neither usageMetadata nor
llmOutput.usage have been processed.

🧹 Nitpick comments (1)

packages/instrumentation-openai/src/instrumentation.ts (1)
972-978: ⚡ Quick win

Use explicit type guard for consistency with surrounding code.

Lines 953-958 use typeof inputTokens === "number" and typeof outputTokens === "number" to guard attribute setting, ensuring attributes are set even when values are 0. Line 973 uses if (cachedTokens) which is falsy for 0, meaning the attribute won't be set when zero tokens are cached.

While omitting the attribute when there are zero cached tokens may be semantically acceptable ("cache not used"), it's inconsistent with how the function handles other token counts and could cause downstream aggregators to treat "absent" differently from "zero."
Align with the existing type-guard pattern
-        const cachedTokens = result.usage.input_tokens_details?.cached_tokens;
-        if (cachedTokens) {
+        const cachedTokens = result.usage.input_tokens_details?.cached_tokens;
+        if (typeof cachedTokens === "number") {
           span.setAttribute(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/instrumentation-openai/src/instrumentation.ts` around lines 972 -
978, The code at line 973 uses a falsy check `if (cachedTokens)` to guard the
ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute setting, which is
inconsistent with the pattern used earlier in the function for inputTokens and
outputTokens (lines 953-958) that use explicit type guards like `typeof
inputTokens === "number"`. Replace the `if (cachedTokens)` condition with
`typeof cachedTokens === "number"` to match the existing type-guard pattern and
ensure the attribute is properly set even when the cached tokens value is 0.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/instrumentation-langchain/src/callback_handler.ts`:
- Around line 356-373: The guard condition for the tokenUsage fallback block
checking `!usageMetadata && output.llmOutput?.tokenUsage` does not account for
whether llmOutput.usage was already processed. If both llmOutput.usage and
llmOutput.tokenUsage exist, the tokenUsage block will execute and overwrite the
attributes (ATTR_GEN_AI_USAGE_INPUT_TOKENS, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, and
GEN_AI_USAGE_TOTAL_TOKENS) that were already set by the prior llmOutput.usage
block. Add an additional guard check for `!output.llmOutput.usage` to the if
condition so the tokenUsage fallback only runs when neither usageMetadata nor
llmOutput.usage have been processed.

---

Nitpick comments:
In `@packages/instrumentation-openai/src/instrumentation.ts`:
- Around line 972-978: The code at line 973 uses a falsy check `if
(cachedTokens)` to guard the ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute
setting, which is inconsistent with the pattern used earlier in the function for
inputTokens and outputTokens (lines 953-958) that use explicit type guards like
`typeof inputTokens === "number"`. Replace the `if (cachedTokens)` condition
with `typeof cachedTokens === "number"` to match the existing type-guard pattern
and ensure the attribute is properly set even when the cached tokens value is 0.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 574594ea-d228-45dc-b03c-7e859e58e68a

📥 Commits

Reviewing files that changed from the base of the PR and between a6da699 and 29c0f99.

📒 Files selected for processing (8)

packages/instrumentation-bedrock/tests/anthropic.test.ts
packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
packages/instrumentation-langchain/src/callback_handler.ts
packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
packages/instrumentation-openai/src/instrumentation.ts
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-cache-read-input-tokens-in-span-for-responses-with-cached-tokens_3930499540/recording.har
packages/instrumentation-together/test/instrumentation.test.ts
packages/instrumentation-vertexai/tests/gemini.test.ts

✅ Files skipped from review due to trivial changes (1)

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-cache-read-input-tokens-in-span-for-responses-with-cached-tokens_3930499540/recording.har

🚧 Files skipped from review as they are similar to previous changes (5)

packages/instrumentation-together/test/instrumentation.test.ts
packages/instrumentation-bedrock/tests/anthropic.test.ts
packages/instrumentation-bedrock/tests/cache-token-fold-in.test.ts
packages/instrumentation-langchain/test/cache-token-fold-in.test.ts
packages/instrumentation-vertexai/tests/gemini.test.ts

dvirski added 5 commits June 15, 2026 16:06

set openai input cache tokens

11175da

set bedrock anthropic cache tokens

2a196f6

set together cache tokens

b6f0d89

set vertexai cache tokens

c495dc1

fix(instrumentations): include cache tokens in gen_ai.usage.input_tokens

a6da699

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

max-deygin-traceloop approved these changes Jun 16, 2026

View reviewed changes

CR Comments

29c0f99

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(instrumentations): include cache tokens in gen ai.usage.input tokens#1027

fix(instrumentations): include cache tokens in gen ai.usage.input tokens#1027
dvirski wants to merge 6 commits into
mainfrom
dr/fix(instrumentation)-include-cache-tokens-in-gen_ai.usage.input_tokens

dvirski commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

max-deygin-traceloop left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dvirski commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

max-deygin-traceloop left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dvirski commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading