Skip to content

e2e box: agent launches concurrent pip install resolvers — latent timeout flake #77

@Antawari

Description

@Antawari

Observed

During a release-gate Box E2E run, the in-box claude agent launched a
second pip install bonfire-ai while the first was still resolving — two
concurrent pip dependency-resolver subprocesses ran 12–14 min, thrashing the
shared pip cache. They self-resolved before the 1800s timeout, so the run
still produced a PASS verdict — but it is a latent flake.

Risk

A future fire could have both resolvers still backtracking when the 30-min
timeout fires, which would TERM claude (exit 124) and burn the full
budget. The verdict logic wouldn't fail on it (exit code isn't a failure
reason), but it wastes an entire run.

Owner / fix

Prompt-template layer — tests/e2e/prompts/runner-prompt.md (baked into the
box image). Options: pin bonfire-ai or use a constraints file to cut
resolver backtracking, or instruct the agent not to launch a second
pip install while one is in flight.

Severity: low, non-blocking. Surfaced 2026-05-14, Box E2E Cycle 2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions