Skip to content

tooling(backfill): raise MAX_MESSAGES cap, surface truncation#37

Open
heavygee wants to merge 1 commit into
mainfrom
tooling/backfill-truncation-warnings
Open

tooling(backfill): raise MAX_MESSAGES cap, surface truncation#37
heavygee wants to merge 1 commit into
mainfrom
tooling/backfill-truncation-warnings

Conversation

@heavygee
Copy link
Copy Markdown
Owner

@heavygee heavygee commented Jun 6, 2026

Summary

The old hardcoded MAX_MESSAGES = 2000 silently dropped the tail of any transcript with more than 2k records. Hit this concretely on a 2974-line chat (jessica-story founding chat): 974 lines lost with zero indication in either the script output or the wrapper's report.

Changes

  • Default cap raised to 50_000 (covers all real chats sampled; 30MB jsonl import is well within memory budget for a one-time per-session ingest).
  • Override via --max-messages <N> CLI arg on both backfill-agent-transcript.ts and attach-agent-chat.ts (forwarded), and via HAPI_BACKFILL_MAX_MESSAGES env.
  • BackfillResult now returns { rawTranscriptLines, maxMessagesApplied, truncated } so callers can detect drops without re-parsing.
  • backfill-agent-transcript.ts emits a console.warn line when truncating.
  • attach-agent-chat.ts surfaces a prominent BACKFILL TRUNCATED warning at the end of the report (after the JSON dump) so the operator sees it even if it would otherwise be buried mid-output.
  • New countTranscriptRecords() helper exported for callers that want the raw count without paying for full parsing.

Why this matters for #34 (ACP migration)

A lossy backfill produces a HAPI session whose messages table has incomplete scrollback. If you then migrate that session to ACP via #34's transplant, the in-store store.db content is unchanged (so the agent's brain is fine), but the HAPI hub/web UI scrollback is missing the tail. Reviewers can't audit what the agent did before migration. This fix makes backfill non-lossy by default so post-migration audit is trustworthy.

Test plan

  • New tests: backfillSessionMessages flags truncated=true when transcript exceeds --max-messages cap, with correct rawTranscriptLines / maxMessagesApplied.
  • New tests: truncated=false on a single-line transcript.
  • Existing 8 tests still pass (10 total in suite).
  • Cross-reviewer: confirm the console.warn writes to stderr and doesn't break stdout-as-JSON parsing for any wrapper script that pipes through jq.

Companion branch

tooling/legacy-chat-acp-alignment (#36) — adds the post-attach ACP migration hint and commits the operator-authored hapi-resurrect-session.sh. Together these two branches close the find→attach→backfill→resurrect→migrate loop for #34.

Made with Cursor

The old hardcoded MAX_MESSAGES=2000 silently dropped the tail of any
transcript with more than 2k records. Hit this concretely on a 2974-line
chat: 974 lines lost with zero indication in either the script output or
the wrapper script's report.

Changes:
- Default cap raised to 50_000 (covers all real chats sampled; 30MB jsonl
  is well within import-once memory budget).
- Override via --max-messages <N> CLI arg on both backfill-agent-transcript.ts
  and attach-agent-chat.ts (forwarded), and via HAPI_BACKFILL_MAX_MESSAGES env.
- BackfillResult now returns { rawTranscriptLines, maxMessagesApplied,
  truncated } so callers can detect drops without re-parsing.
- backfill-agent-transcript.ts emits a console.warn line when truncating.
- attach-agent-chat.ts surfaces a prominent "BACKFILL TRUNCATED" warning
  at the end of the report (after the JSON dump, before the prompt
  returns) so the operator sees it even if buried mid-output.
- New countTranscriptRecords() helper kept exported for callers that want
  the raw count without paying for full parsing.

Test coverage:
- New test: backfillSessionMessages flags truncated=true when transcript
  exceeds --max-messages cap, with correct rawTranscriptLines/maxMessagesApplied.
- New test: truncated=false on a single-line transcript.
- Existing 8 tests still pass (10 total).

Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant