feat: auto-compact and retry on context window errors by TheArchitectit · Pull Request #4 · TheArchitectit/claw-code

TheArchitectit · 2026-04-23T19:39:50Z

Problem

When a conversation grows large enough to exceed the model's context window, the API returns a context_window_blocked error. Previously, this would fail the request and require the user to manually compact the session (or start over), interrupting the workflow.

Solution

This PR implements automatic session compaction with transparent retry:

Detects context_window_blocked errors from the API
Compacts the session automatically using the existing compaction logic
Retries the original request with the compacted context
Reports the compaction results (messages removed, tokens saved) back to the user

Flow

User request → API returns context_window_blocked
                    ↓
            Auto-compact session (remove old messages)
                    ↓
            Retry request with compacted context
                    ↓
            Report: "Removed N messages, completed request"

Key Behaviors

Non-interactive: No user prompt required — the retry happens automatically
Transparent: The user sees a brief status message about the compaction, then the response
Safe: Uses the existing, well-tested compaction path; no new persistence logic
Single retry: Only attempts one auto-compact + retry to avoid loops

Testing

Verified with a session that exceeded context limit — removed 19 messages and completed the request successfully after auto-compact
Verified compaction report is surfaced to user
Edge case: verify behavior when compaction itself fails (graceful fallback to error)
Edge case: verify behavior on consecutive context window errors

Files Changed

rust/crates/runtime/src/conversation.rs — auto-compact retry logic in the request path
rust/crates/api/src/error.rs — context_window_blocked error detection

Impact

Eliminates manual intervention for long-running sessions
Reduces friction for users working with large codebases or extended conversations
Backward compatible: no changes to compaction behavior itself

gemini-code-assist

Code Review

This pull request introduces an automatic retry mechanism when the model API returns a context window error by compacting the session and resubmitting the request. While the feature is useful, the current implementation has a critical logic error where the retry still uses the original uncompacted session, rendering the compaction ineffective. Feedback also highlights issues with the lifecycle management of the abort monitor, potential UI corruption from reusing a finished spinner, and the need to ensure retries only occur if messages were actually removed during compaction.

gemini-code-assist · 2026-04-23T19:42:25Z

+                        let (mut new_runtime, hook_abort_monitor) = self.prepare_turn_runtime(true)?;
+                        drop(hook_abort_monitor); // not needed for retry
+
+                        // Step 3: Run the turn again with the smaller session
+                        let mut rp = CliPermissionPrompter::new(self.permission_mode);
+                        match new_runtime.run_turn(input, Some(&mut rp)) {
+                            Ok(summary) => {
+                                // Success! Replace old runtime with the new compacted one
+                                self.replace_runtime(new_runtime)?;
+                                spinner.finish(
+                                    "✨ Done (after auto-compact)",
+                                    TerminalRenderer::new().color_theme(),
+                                    &mut stdout,
+                                )?;


This block contains a critical logic error and UI issues:

Logic Error: prepare_turn_runtime is hardcoded to use the session from self.runtime, which is still the original uncompacted session. Consequently, the retry will be performed with the same oversized context that caused the initial error, leading to a repeated failure. You must construct the runtime using result.compacted_session.

Monitor Lifecycle: hook_abort_monitor should not be dropped immediately; it must remain active during the retry to handle Ctrl-C interrupts for tool execution (hooks), and then be stopped explicitly after the turn completes.

UI State: Calling spinner.finish on the original spinner object is problematic because spinner.fail was already called at line 4288. Reusing a spinner that has reached a terminal state can corrupt terminal output. It is better to use a fresh spinner for the retry attempt.

// Step 2: Build a new runtime with the compacted session and retry let hook_abort_signal = runtime::HookAbortSignal::new(); let mut new_runtime = build_runtime( result.compacted_session, &self.session.id, self.model.clone(), self.system_prompt.clone(), true, true, self.allowed_tools.clone(), self.permission_mode, None, )? .with_hook_abort_signal(hook_abort_signal.clone()); let hook_abort_monitor = HookAbortMonitor::spawn(hook_abort_signal); // Step 3: Run the turn again with the smaller session let mut rp = CliPermissionPrompter::new(self.permission_mode); let mut retry_spinner = Spinner::new(); retry_spinner.tick("🦀 Retrying...", TerminalRenderer::new().color_theme(), &mut stdout)?; let retry_result = new_runtime.run_turn(input, Some(&mut rp)); hook_abort_monitor.stop(); match retry_result { Ok(summary) => { // Success! Replace old runtime with the new compacted one self.replace_runtime(new_runtime)?; retry_spinner.finish( "✨ Done (after auto-compact)", TerminalRenderer::new().color_theme(), &mut stdout, )?;

gemini-code-assist · 2026-04-23T19:42:25Z

+
+                    // Only proceed if compaction actually happened (messages were removed)
+                    // or there's still a session to work with
+                    if removed > 0 || result.compacted_session.messages.len() > 0 {


The condition removed > 0 || result.compacted_session.messages.len() > 0 is likely too permissive. If removed == 0, the session state remains identical to the one that just failed, meaning the retry will inevitably encounter the same context window error. It is more efficient to only attempt a retry if compaction actually removed messages to free up space.

Suggested change

if removed > 0 || result.compacted_session.messages.len() > 0 {

if removed > 0 {

When the model API returns a context_window_blocked error (because the request exceeds the model's context window), the CLI now automatically: 1. Compact the session (remove old messages to free up space) 2. Retry the original request with the compacted session 3. Report results to the user This eliminates the need for users to manually run /compact when they hit context limits - the recovery happens automatically. ## Technical Details - Detection: Looks for 'context_window' or 'Context window' in error message - Uses runtime::compact_session() to aggressively compact (max_estimated_tokens=0) - Creates new runtime with compacted session and retries the turn - Reports compaction results and final status to user ## Testing Tested successfully with a request that exceeded model's context: - Auto-compact triggered: 'Messages removed 19, Messages kept 5' - Successfully retried and completed after compaction

Some OpenAI-compatible providers (e.g., GLM-5) omit the `id` field in streaming and non-streaming responses. Adding #[serde(default)] allows the parser to accept these responses instead of failing with "missing field `id`". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds scripts/install.sh that builds the release binary and links it to ~/.local/bin/claw. Run after code changes to update the CLI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When a provider returns HTML (e.g., error page, wrong endpoint) instead of JSON in an SSE stream, provide a clear error message instead of hanging or failing with a cryptic parse error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When a provider returns a JSON error (e.g., {"error":{"message":"..."}}) without SSE framing (no "data:" prefix), the SSE parser was silently ignoring it and hanging. Now detects and surfaces these errors. Also handles HTML responses that lack SSE framing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Some providers (GLM, DeepSeek) emit reasoning tokens in `reasoning_content` or nested `thinking.content` fields instead of `content`. Added support for these fields so reasoning models work correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The final streaming chunk from some providers contains only finish_reason and usage, with no delta field. Made it optional to prevent parse errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When preserve_recent_messages == 0, raw_keep_from equals messages.len(), causing index out of bounds when accessing session.messages[k]. Added k >= session.messages.len() check to prevent panic. Reason: Compaction with preserve_recent_messages=0 triggered OOB access when checking for tool-use/tool-result pair preservation at boundary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

TheArchitectit and others added 8 commits May 10, 2026 21:26

chore: add install script for rebuild and link

f9743b6

Adds scripts/install.sh that builds the release binary and links it to ~/.local/bin/claw. Run after code changes to update the CLI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix: detect HTML responses in streaming path

403074b

When a provider returns HTML (e.g., error page, wrong endpoint) instead of JSON in an SSE stream, provide a clear error message instead of hanging or failing with a cryptic parse error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix: make delta field optional in ChunkChoice

d9db978

The final streaming chunk from some providers contains only finish_reason and usage, with no delta field. Made it optional to prevent parse errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

TheArchitectit force-pushed the feat/auto-compact-new branch from 6a37558 to a15c602 Compare May 10, 2026 21:26

TheArchitectit force-pushed the main branch from e15c9ab to d229a9b Compare June 10, 2026 21:47

docs: add bugfix and debug notes

378ef51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-compact and retry on context window errors#4

feat: auto-compact and retry on context window errors#4
TheArchitectit wants to merge 9 commits into
mainfrom
feat/auto-compact-new

TheArchitectit commented Apr 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if removed > 0 \|\| result.compacted_session.messages.len() > 0 {
	if removed > 0 {

Conversation

TheArchitectit commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Flow

Key Behaviors

Testing

Files Changed

Impact

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TheArchitectit commented Apr 23, 2026 •

edited

Loading