tooling(backfill): raise MAX_MESSAGES cap, surface truncation by heavygee · Pull Request #37 · heavygee/hapi

heavygee · 2026-06-06T22:10:59Z

Summary

The old hardcoded MAX_MESSAGES = 2000 silently dropped the tail of any transcript with more than 2k records. Hit this concretely on a 2974-line chat (jessica-story founding chat): 974 lines lost with zero indication in either the script output or the wrapper's report.

Changes

Default cap raised to 50_000 (covers all real chats sampled; 30MB jsonl import is well within memory budget for a one-time per-session ingest).
Override via --max-messages <N> CLI arg on both backfill-agent-transcript.ts and attach-agent-chat.ts (forwarded), and via HAPI_BACKFILL_MAX_MESSAGES env.
BackfillResult now returns { rawTranscriptLines, maxMessagesApplied, truncated } so callers can detect drops without re-parsing.
backfill-agent-transcript.ts emits a console.warn line when truncating.
attach-agent-chat.ts surfaces a prominent BACKFILL TRUNCATED warning at the end of the report (after the JSON dump) so the operator sees it even if it would otherwise be buried mid-output.
New countTranscriptRecords() helper exported for callers that want the raw count without paying for full parsing.

Why this matters for #34 (ACP migration)

A lossy backfill produces a HAPI session whose messages table has incomplete scrollback. If you then migrate that session to ACP via #34's transplant, the in-store store.db content is unchanged (so the agent's brain is fine), but the HAPI hub/web UI scrollback is missing the tail. Reviewers can't audit what the agent did before migration. This fix makes backfill non-lossy by default so post-migration audit is trustworthy.

Test plan

New tests: backfillSessionMessages flags truncated=true when transcript exceeds --max-messages cap, with correct rawTranscriptLines / maxMessagesApplied.
New tests: truncated=false on a single-line transcript.
Existing 8 tests still pass (10 total in suite).
Cross-reviewer: confirm the console.warn writes to stderr and doesn't break stdout-as-JSON parsing for any wrapper script that pipes through jq.

Companion branch

tooling/legacy-chat-acp-alignment (#36) — adds the post-attach ACP migration hint and commits the operator-authored hapi-resurrect-session.sh. Together these two branches close the find→attach→backfill→resurrect→migrate loop for #34.

Made with Cursor

The old hardcoded MAX_MESSAGES=2000 silently dropped the tail of any transcript with more than 2k records. Hit this concretely on a 2974-line chat: 974 lines lost with zero indication in either the script output or the wrapper script's report. Changes: - Default cap raised to 50_000 (covers all real chats sampled; 30MB jsonl is well within import-once memory budget). - Override via --max-messages <N> CLI arg on both backfill-agent-transcript.ts and attach-agent-chat.ts (forwarded), and via HAPI_BACKFILL_MAX_MESSAGES env. - BackfillResult now returns { rawTranscriptLines, maxMessagesApplied, truncated } so callers can detect drops without re-parsing. - backfill-agent-transcript.ts emits a console.warn line when truncating. - attach-agent-chat.ts surfaces a prominent "BACKFILL TRUNCATED" warning at the end of the report (after the JSON dump, before the prompt returns) so the operator sees it even if buried mid-output. - New countTranscriptRecords() helper kept exported for callers that want the raw count without paying for full parsing. Test coverage: - New test: backfillSessionMessages flags truncated=true when transcript exceeds --max-messages cap, with correct rawTranscriptLines/maxMessagesApplied. - New test: truncated=false on a single-line transcript. - Existing 8 tests still pass (10 total). Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector · 2026-06-06T22:11:05Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tooling(backfill): raise MAX_MESSAGES cap, surface truncation#37

tooling(backfill): raise MAX_MESSAGES cap, surface truncation#37
heavygee wants to merge 1 commit into
mainfrom
tooling/backfill-truncation-warnings

heavygee commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heavygee commented Jun 6, 2026

Summary

Changes

Why this matters for #34 (ACP migration)

Test plan

Companion branch

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant