Skip to content

feat: auto-compact and retry on context window errors#4

Open
TheArchitectit wants to merge 9 commits into
mainfrom
feat/auto-compact-new
Open

feat: auto-compact and retry on context window errors#4
TheArchitectit wants to merge 9 commits into
mainfrom
feat/auto-compact-new

Conversation

@TheArchitectit

@TheArchitectit TheArchitectit commented Apr 23, 2026

Copy link
Copy Markdown
Owner

Problem

When a conversation grows large enough to exceed the model's context window, the API returns a context_window_blocked error. Previously, this would fail the request and require the user to manually compact the session (or start over), interrupting the workflow.

Solution

This PR implements automatic session compaction with transparent retry:

  1. Detects context_window_blocked errors from the API
  2. Compacts the session automatically using the existing compaction logic
  3. Retries the original request with the compacted context
  4. Reports the compaction results (messages removed, tokens saved) back to the user

Flow

User request → API returns context_window_blocked
                    ↓
            Auto-compact session (remove old messages)
                    ↓
            Retry request with compacted context
                    ↓
            Report: "Removed N messages, completed request"

Key Behaviors

  • Non-interactive: No user prompt required — the retry happens automatically
  • Transparent: The user sees a brief status message about the compaction, then the response
  • Safe: Uses the existing, well-tested compaction path; no new persistence logic
  • Single retry: Only attempts one auto-compact + retry to avoid loops

Testing

  • Verified with a session that exceeded context limit — removed 19 messages and completed the request successfully after auto-compact
  • Verified compaction report is surfaced to user
  • Edge case: verify behavior when compaction itself fails (graceful fallback to error)
  • Edge case: verify behavior on consecutive context window errors

Files Changed

  • rust/crates/runtime/src/conversation.rs — auto-compact retry logic in the request path
  • rust/crates/api/src/error.rscontext_window_blocked error detection

Impact

  • Eliminates manual intervention for long-running sessions
  • Reduces friction for users working with large codebases or extended conversations
  • Backward compatible: no changes to compaction behavior itself

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automatic retry mechanism when the model API returns a context window error by compacting the session and resubmitting the request. While the feature is useful, the current implementation has a critical logic error where the retry still uses the original uncompacted session, rendering the compaction ineffective. Feedback also highlights issues with the lifecycle management of the abort monitor, potential UI corruption from reusing a finished spinner, and the need to ensure retries only occur if messages were actually removed during compaction.

Comment on lines +4339 to +4352
let (mut new_runtime, hook_abort_monitor) = self.prepare_turn_runtime(true)?;
drop(hook_abort_monitor); // not needed for retry

// Step 3: Run the turn again with the smaller session
let mut rp = CliPermissionPrompter::new(self.permission_mode);
match new_runtime.run_turn(input, Some(&mut rp)) {
Ok(summary) => {
// Success! Replace old runtime with the new compacted one
self.replace_runtime(new_runtime)?;
spinner.finish(
"✨ Done (after auto-compact)",
TerminalRenderer::new().color_theme(),
&mut stdout,
)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block contains a critical logic error and UI issues:

  1. Logic Error: prepare_turn_runtime is hardcoded to use the session from self.runtime, which is still the original uncompacted session. Consequently, the retry will be performed with the same oversized context that caused the initial error, leading to a repeated failure. You must construct the runtime using result.compacted_session.
  2. Monitor Lifecycle: hook_abort_monitor should not be dropped immediately; it must remain active during the retry to handle Ctrl-C interrupts for tool execution (hooks), and then be stopped explicitly after the turn completes.
  3. UI State: Calling spinner.finish on the original spinner object is problematic because spinner.fail was already called at line 4288. Reusing a spinner that has reached a terminal state can corrupt terminal output. It is better to use a fresh spinner for the retry attempt.
                        // Step 2: Build a new runtime with the compacted session and retry
                        let hook_abort_signal = runtime::HookAbortSignal::new();
                        let mut new_runtime = build_runtime(
                            result.compacted_session,
                            &self.session.id,
                            self.model.clone(),
                            self.system_prompt.clone(),
                            true,
                            true,
                            self.allowed_tools.clone(),
                            self.permission_mode,
                            None,
                        )?
                        .with_hook_abort_signal(hook_abort_signal.clone());
                        let hook_abort_monitor = HookAbortMonitor::spawn(hook_abort_signal);
                        
                        // Step 3: Run the turn again with the smaller session
                        let mut rp = CliPermissionPrompter::new(self.permission_mode);
                        let mut retry_spinner = Spinner::new();
                        retry_spinner.tick("🦀 Retrying...", TerminalRenderer::new().color_theme(), &mut stdout)?;
                        let retry_result = new_runtime.run_turn(input, Some(&mut rp));
                        hook_abort_monitor.stop();

                        match retry_result {
                            Ok(summary) => {
                                // Success! Replace old runtime with the new compacted one
                                self.replace_runtime(new_runtime)?;
                                retry_spinner.finish(
                                    "✨ Done (after auto-compact)",
                                    TerminalRenderer::new().color_theme(),
                                    &mut stdout,
                                )?;


// Only proceed if compaction actually happened (messages were removed)
// or there's still a session to work with
if removed > 0 || result.compacted_session.messages.len() > 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition removed > 0 || result.compacted_session.messages.len() > 0 is likely too permissive. If removed == 0, the session state remains identical to the one that just failed, meaning the retry will inevitably encounter the same context window error. It is more efficient to only attempt a retry if compaction actually removed messages to free up space.

Suggested change
if removed > 0 || result.compacted_session.messages.len() > 0 {
if removed > 0 {

TheArchitectit and others added 8 commits May 10, 2026 21:26
When the model API returns a context_window_blocked error (because the request
exceeds the model's context window), the CLI now automatically:

1. Compact the session (remove old messages to free up space)
2. Retry the original request with the compacted session
3. Report results to the user

This eliminates the need for users to manually run /compact when they
hit context limits - the recovery happens automatically.

## Technical Details

- Detection: Looks for 'context_window' or 'Context window' in error message
- Uses runtime::compact_session() to aggressively compact (max_estimated_tokens=0)
- Creates new runtime with compacted session and retries the turn
- Reports compaction results and final status to user

## Testing

Tested successfully with a request that exceeded model's context:
- Auto-compact triggered: 'Messages removed 19, Messages kept 5'
- Successfully retried and completed after compaction
Some OpenAI-compatible providers (e.g., GLM-5) omit the `id` field in
streaming and non-streaming responses. Adding #[serde(default)] allows
the parser to accept these responses instead of failing with
"missing field `id`".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds scripts/install.sh that builds the release binary and links it
to ~/.local/bin/claw. Run after code changes to update the CLI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a provider returns HTML (e.g., error page, wrong endpoint) instead
of JSON in an SSE stream, provide a clear error message instead of
hanging or failing with a cryptic parse error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a provider returns a JSON error (e.g., {"error":{"message":"..."}})
without SSE framing (no "data:" prefix), the SSE parser was silently
ignoring it and hanging. Now detects and surfaces these errors.

Also handles HTML responses that lack SSE framing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some providers (GLM, DeepSeek) emit reasoning tokens in `reasoning_content`
or nested `thinking.content` fields instead of `content`. Added support
for these fields so reasoning models work correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The final streaming chunk from some providers contains only finish_reason
and usage, with no delta field. Made it optional to prevent parse errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When preserve_recent_messages == 0, raw_keep_from equals messages.len(),
causing index out of bounds when accessing session.messages[k].

Added k >= session.messages.len() check to prevent panic.

Reason: Compaction with preserve_recent_messages=0 triggered OOB access
when checking for tool-use/tool-result pair preservation at boundary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant