Skip to content

feat(chat): durable internal tool trace persistence#54

Merged
ABB65 merged 1 commit into
mainfrom
feat/chat-tool-trace-persistence
May 18, 2026
Merged

feat(chat): durable internal tool trace persistence#54
ABB65 merged 1 commit into
mainfrom
feat/chat-tool-trace-persistence

Conversation

@ABB65
Copy link
Copy Markdown
Member

@ABB65 ABB65 commented May 18, 2026

Summary

Before this PR the chat loop stored exactly two rows per POST — user prompt and final assistant message — collapsing every multi-iteration tool turn into a flat row pair. Intermediate assistant narration was dropped; tool_result blocks were never persisted at all. Resume reads couldn't reconstruct the Anthropic-protocol shape Claude saw on prior turns, so multi-iteration conversations effectively lost their tool history once the request ended.

This PR persists the full protocol-replay trace under a single turn_id per POST while keeping the user-facing transcript clean.

Schema (migration 009)

ALTER TABLE messages
  ADD content_blocks jsonb NULL           -- structured Anthropic blocks
  ADD turn_id uuid NOT NULL               -- groups all rows from one POST
  ADD turn_sequence smallint NOT NULL     -- deterministic order in batch
  ADD iteration smallint NULL             -- engine iteration counter
  ADD internal boolean NOT NULL DEFAULT false

RLS rewrite (the security boundary):

SELECT  internal = false AND user owns conversation
INSERT  internal = false AND user owns conversation

Service-role queries bypass RLS and load the full trace. This is defense-in-depth — a stray client query OR a buggy route forgetting includeInternal: false cannot leak internal rows.

Persistence shape per POST

  • 1 seed user row — internal=false, iteration=NULL, turn_sequence=0
  • N assistant rows — internal=true for intermediate iterations, internal=false for the final one
  • 0..N tool_result rows — internal=true, role='user' (matches Anthropic protocol where tool_results are sent as user messages)
  • All share one turn_id; turn_sequence increments per row
  • Token columns land only on the final visible assistant row (Anthropic returns per-call totals, not per-iteration)

Turn-safe history budget — the protocol-critical change

The old row-level cutoff could drop an assistant tool_use block while keeping its matching tool_result (or vice-versa); Anthropic rejects orphaned tool_use blocks and silently drifts on orphaned tool_result. The walker now:

  1. Groups rows by turn_id preserving DB order
  2. Walks groups newest → oldest summing per-group estimates
  3. Drops ENTIRE turns at the budget boundary — never half
  4. If the DB row limit truncated mid-turn (rare), drops the leading partial group

Legacy rows (no turn_id) fall back to single-row "turns" — protocol-safe by definition since they never carried tool blocks.

Provider surface

  • MessageInsertInput exposes new columns + internal
  • insertMessages(rows[]) — single batched INSERT, one round-trip instead of N, atomic at the batch level
  • loadConversationMessages(..., { includeInternal }) — defaults false; resume paths set true

Quota — UNCHANGED

agent_usage.message_count and api_message_usage.message_count still increment exactly once per POST via the existing reservation step. Multiple persisted rows do NOT bill the user multiple messages. Trace persistence is Contentrain-side observability/replay, not customer-facing metering.

Test plan

  • pnpm typecheck clean
  • pnpm lint — 0 errors on changed files
  • pnpm test622 passed (618 + 4 new)
    • db: trace row shape (seed user + assistant + tool_result under one turn_id), single batched insertMessages call, intermediate rows internal=true / final internal=false, cache token landing on final, Conversation API symmetric path
    • conversation-history: content_blocks priority over legacy columns; turn-grouped budget keeps whole multi-row turns together; older turns dropped whole when budget overflows
    • chat-route integration: resume passes includeInternal: true; saveChatResult receives iterations array instead of the old assistantText/assistantContent pair

Out of scope (separate follow-ups)

  • Backfill of pre-009 rows. Pre-launch system, no real data; legacy rows live as single-row turns under the new walker.
  • block_kind discriminator column. Discriminator lives inside each block's type; partial indexes can grow when there's a real query pattern.
  • Per-iteration token breakdown. Provider returns per-call totals only.
  • UI rendering of tool_use chips. Data is there; UI can adapt at its own pace.

Diff stat

11 files changed, +684 / −137. Migration + 4 new unit tests + 1 new integration assertion.

Before this PR, the chat loop stored exactly two rows per POST: the
user prompt and the final assistant message (collapsed text + the
final iteration's tool_calls jsonb). Intermediate assistant turns
were dropped on the floor and tool_result blocks were never persisted
— the engine streamed them to Anthropic, fed them back through the
in-memory `config.messages` array, then forgot them. Resume reads
couldn't reconstruct the Anthropic protocol shape Claude had seen on
the prior turn, so multi-iteration conversations effectively lost
their tool history.

This PR persists the full Anthropic-protocol trace under a single
`turn_id` per POST while keeping the user-facing transcript clean.

Schema (migration 009):

  ALTER TABLE messages
    ADD content_blocks jsonb NULL           -- structured Anthropic blocks
    ADD turn_id uuid NOT NULL               -- groups all rows from one POST
    ADD turn_sequence smallint NOT NULL     -- deterministic order in batch
    ADD iteration smallint NULL             -- engine iteration counter
    ADD internal boolean NOT NULL DEFAULT false

  RLS rewrite:
    SELECT  internal = false AND user owns conversation
    INSERT  internal = false AND user owns conversation

  Service-role queries bypass RLS and read/write the full trace —
  this is a defense-in-depth boundary so a stray client query OR a
  buggy route forgetting to pass `includeInternal: false` cannot
  leak internal trace rows. Two indexes: one ordering by
  (conversation, created_at, turn_sequence) for the resume path, one
  partial on `internal = false` for the public transcript hot path.

Persistence shape per POST:

  - 1 seed user row              internal=false, iteration=NULL, turn_sequence=0
  - N assistant rows             internal=true  for intermediate, false for final
  - 0..N tool_result rows        internal=true,  role='user'
  All share one `turn_id`; `turn_sequence` increments per row.

Token columns land only on the final visible assistant row —
Anthropic returns usage as a per-call total, not per-iteration.

Engine:
  - `runConversationLoop` accumulates an `IterationTrace[]` and
    surfaces it on the `done` event. The in-memory `config.messages`
    push for the next AI call is unchanged.

Persistence helpers (`saveChatResult` / `saveApiChatResult`):
  - Object-form args.
  - One `randomUUID()` per POST allocated as `turnId`.
  - `buildTraceRows` composes the row list deterministically and
    `db.insertMessages` writes them as a single batched INSERT — one
    round-trip instead of N, atomic at the batch level.

Provider surface:
  - `MessageInsertInput` exposes the new columns + `internal`.
  - `insertMessages(rows[])` is the batch path; `insertMessage` stays
    for the rare one-row sites.
  - `loadConversationMessages(..., { includeInternal })` — defaults
    to false; the chat / Conversation API resume paths set true.

Turn-safe history budget (`buildPromptMessages`):

  The single most protocol-critical change in this PR. The old
  row-level cutoff could drop an assistant `tool_use` while keeping
  its matching `tool_result` (or vice-versa), and Anthropic rejects
  that — orphaned tool_use blocks invalidate the request, orphaned
  tool_result blocks silently drift the conversation. The walker
  now:

    1. Groups rows by `turn_id` preserving DB order.
    2. Walks groups newest → oldest summing per-group estimates.
    3. Drops ENTIRE turns at the budget boundary, never half.
    4. If the DB row limit truncated mid-turn (rare), drops the
       leading partial group so a turn never starts with a
       tool_result missing its tool_use.

  Legacy rows without `turn_id` fall back to single-row "turns"
  through the helper's null-handling — protocol-safe by definition
  since the legacy path never persisted tool blocks.

Read priority in `extractContent`:
  content_blocks (post-009) → tool_calls (legacy) → content (text)

Public transcript routes (Studio `/messages.get`, EE `/history.get`)
use the provider default `includeInternal: false`; internal rows
stay hidden at the DB layer (RLS) AND at the provider layer.

Quota unchanged: `agent_usage.message_count` and
`api_message_usage.message_count` increment once per POST via the
existing reservation step. Multiple persisted rows do NOT bill the
user multiple messages — cache and trace persistence are
Contentrain-side observability, not customer-facing meters.

Tests:

  - db: trace shape (seed user + assistant + tool_result rows under
    one turn_id), single batched insertMessages call, intermediate
    rows internal=true / final internal=false, cache token landing,
    Conversation API symmetric path.
  - conversation-history: content_blocks priority over legacy
    columns; turn-grouped budget keeps whole multi-row turns
    together (assistant tool_use + matching tool_result); whole
    older turns dropped when budget overflows — never half.
  - chat-route integration: resume path explicitly passes
    includeInternal: true; saveChatResult receives iterations array
    instead of the old assistantText/assistantContent pair.

Out of scope:

  - Backfill of pre-009 rows. Pre-launch system, no real data.
    Legacy rows get distinct turn_id defaults via gen_random_uuid()
    and live as single-row turns under the new walker.
  - block_kind discriminator column. Discriminator lives inside
    each block's `type`; query patterns can grow expression indexes
    when there's a real need.
  - Per-iteration token breakdown. Provider returns per-call totals
    only.
  - UI rendering of tool_use chips. Data is there; UI can adapt at
    its own pace.
@ABB65 ABB65 merged commit 9369614 into main May 18, 2026
1 check passed
@ABB65 ABB65 deleted the feat/chat-tool-trace-persistence branch May 18, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant