fix(sandbox): bound ACP sandbox lifetime + self-heal vanished attach#153
Merged
Conversation
Persistent-sandbox unification (per-workspace boxes) created sandboxes with auto_stop but no auto_delete, so Daytona stopped → archived → kept them forever. Leaked session-owned boxes (teardown missed on a gateway restart) and abandoned workspace boxes both piled up as archived sandboxes; archived boxes take ~3 min to wake, so once enough accumulate the whole Daytona account crawls and Claude Code sessions hang on cold start. - Set auto_delete_interval on creation (both ACP + code-exec create paths). Scratch (session-owned) boxes hold nothing durable → reclaimed in 1 day (before they even archive); persistent workspace/code-exec boxes hold the user's files → 14-day grace. The "continuously stopped" clock spans the archived period, so archived boxes do get reclaimed. Both intervals are env-tunable; non-positive clamps to "disabled" (Daytona reads 0 as "delete immediately on stop", a data-loss footgun). - Self-heal in _provision_once: a box Daytona auto-deletes still looks "owned" to verify_sandbox_owner (it checks Convex, not Daytona), so the next session would attach a ghost and error. On DaytonaNotFoundError for an attach, drop the stale Convex link and — for a workspace-unification box — create a fresh persistent one and relink. An explicit, user-chosen harness sandbox can't be fabricated, so that surfaces (link still cleared). - Tests: interval selection + clamp, and the missing-attach heal decision.
DIodide
added a commit
that referenced
this pull request
Jun 23, 2026
Release notes for the 1.0.0 major release covering the full unreleased span since v0.2.1 (PRs #81–#153): live session following, rewind/fork, chat + harness sharing & collaboration, Skill Packs, per-workspace agent sandboxes, workspace credentials, per-credential usage, Claude Code config, and reliability/integrity hardening. Devops/infra setup (Redis Streams prod provisioning, CI, deploy plumbing) intentionally excluded — user-facing changes only.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The per-workspace sandbox unification (#141) creates persistent Daytona boxes with
auto_stop_intervalbut noauto_delete_interval, so Daytona stops → archives → keeps them forever. Two leak sources pile up as archived sandboxes:Archived boxes take ~3 minutes to wake (filesystem restored from object storage), so once enough accumulate the whole Daytona account crawls — Claude Code sessions "start indefinitely" and sandboxes won't open. This is exactly what took staging down (82 sandboxes, 60 of them archived). The backlog was cleaned up manually; this PR makes it self-correcting.
What
1. Bound lifetime at the source — set
auto_delete_intervalon both create paths (provision_agent_sandbox+DaytonaService.create_sandbox):0as delete-immediately-on-stop (a data-loss footgun, not "off").ephemeral=Falseon both sites avoids the SDK validator that would force the interval to0.2. Self-heal a vanished attach — a box Daytona auto-deletes still looks "owned" to
verify_sandbox_owner(which checks the Convex record, not Daytona), so the next session would attach a ghost and hard-error. OnDaytonaNotFoundErrorduring an attach,_provision_oncenow:sandboxes:removeByDaytonaId), thenThis mirrors the existing revive-path heal (which already handles a vanished owned box) and closes the gap on the initial-provision/attach path.
Tests
tests/test_sandbox_lifecycle.py:_auto_delete_minutesinterval selection (persist vs scratch, scratch < persistent, non-positive → disabled clamp)._recover_from_missing_attachdecision (workspace box → recover + unlink; explicit harness sandbox → surface but still unlink; id mismatch → treated as workspace)._unlink_dead_sandboxcallsremoveByDaytonaIdand swallowsConvexMutationError.Full fastapi suite: 370 passed, ruff clean.
Notes
DaytonaService.create_sandbox(Manage Sandboxes / code-exec): a user box untouched for 14 continuous days is now reclaimed by Daytona. Capped at 20/user and env-tunable; raiseacp_persistent_sandbox_auto_delete_minutes(or set ≤0 to disable) if a longer-lived box is desired.