Observed
During a release-gate Box E2E run, the in-box claude agent launched a
second pip install bonfire-ai while the first was still resolving — two
concurrent pip dependency-resolver subprocesses ran 12–14 min, thrashing the
shared pip cache. They self-resolved before the 1800s timeout, so the run
still produced a PASS verdict — but it is a latent flake.
Risk
A future fire could have both resolvers still backtracking when the 30-min
timeout fires, which would TERM claude (exit 124) and burn the full
budget. The verdict logic wouldn't fail on it (exit code isn't a failure
reason), but it wastes an entire run.
Owner / fix
Prompt-template layer — tests/e2e/prompts/runner-prompt.md (baked into the
box image). Options: pin bonfire-ai or use a constraints file to cut
resolver backtracking, or instruct the agent not to launch a second
pip install while one is in flight.
Severity: low, non-blocking. Surfaced 2026-05-14, Box E2E Cycle 2.
Observed
During a release-gate Box E2E run, the in-box
claudeagent launched asecond
pip install bonfire-aiwhile the first was still resolving — twoconcurrent pip dependency-resolver subprocesses ran 12–14 min, thrashing the
shared pip cache. They self-resolved before the 1800s
timeout, so the runstill produced a PASS verdict — but it is a latent flake.
Risk
A future fire could have both resolvers still backtracking when the 30-min
timeoutfires, which would TERMclaude(exit 124) and burn the fullbudget. The verdict logic wouldn't fail on it (exit code isn't a failure
reason), but it wastes an entire run.
Owner / fix
Prompt-template layer —
tests/e2e/prompts/runner-prompt.md(baked into thebox image). Options: pin
bonfire-aior use a constraints file to cutresolver backtracking, or instruct the agent not to launch a second
pip installwhile one is in flight.Severity: low, non-blocking. Surfaced 2026-05-14, Box E2E Cycle 2.