Skip to content

fix: don't make a recoverable connection drop look like a crash#482

Open
JulienEllie wants to merge 2 commits into
mpfaffenberger:mainfrom
JulienEllie:bugfix/retry-httpx-network-errors
Open

fix: don't make a recoverable connection drop look like a crash#482
JulienEllie wants to merge 2 commits into
mpfaffenberger:mainfrom
JulienEllie:bugfix/retry-httpx-network-errors

Conversation

@JulienEllie

Copy link
Copy Markdown
Contributor

What

A connection dropped mid-stream (httpx.ReadError) escaped the streaming
retry classifier and bubbled all the way up to the interactive REPL, which
dumped a ~60-line traceback. The loop technically survived, but the wall of
stack frames made a transient VPN/WiFi blip look like a hard crash.

Why it happened

_RETRYABLE_EXCEPTIONS enumerated individual transport error types:

_RETRYABLE_EXCEPTIONS = (
    httpx.RemoteProtocolError,
    httpx.ReadTimeout,
    httpcore.RemoteProtocolError,
)

httpx.ReadError — the error you get when a socket dies mid-response — wasn't
in the list, so should_retry_streaming() returned False and the error
propagated instead of being retried. Classic whack-a-mole: one missing mole.

The fix

  1. Broaden the retry net to umbrella base classes. httpx.NetworkError
    is the parent of ReadError, WriteError, ConnectError, and
    CloseError; httpx.TimeoutException covers every *Timeout variant.
    Listing the bases (plus the httpcore twins) means all transient
    transport failures get the existing 3x backoff retry — no more chasing
    subclasses one funeral at a time.

  2. Stop scaring users with raw tracebacks for environment hiccups. New
    _render_turn_exception() helper: transient/connection errors get a
    friendly one-liner ("connection blipped, re-run your prompt, your session
    is intact"); genuine bugs still get the full, debuggable traceback. It
    reuses the same retry classifier so the retry path and the render path
    can never drift out of sync (DRY).

Tests

  • Regression coverage in test_streaming_retry.py for ReadError,
    ConnectError, and httpcore.ReadError in the classifier.
  • New test_cli_runner_turn_exception.py pins the friendly-message vs.
    full-traceback behavior of _render_turn_exception.
  • 43 new/related tests pass; existing cli_runner suites (45) stay green.

Behavior change

  • Before: dropped socket → traceback dump that looks like a crash.
  • After: dropped socket → silently retried 3x; if it still can't recover,
    a single friendly line, REPL keeps going.

Julien Ellie added 2 commits June 16, 2026 12:22
httpx.ReadError (and its NetworkError siblings) escaped the streaming
retry classifier, so a socket dropped mid-stream -- a transient VPN/WiFi
blip -- re-raised all the way to the REPL, which dumped a 60-line
traceback that looked like a hard crash.

- Broaden _RETRYABLE_EXCEPTIONS to the umbrella base classes
  (httpx/httpcore NetworkError + TimeoutException) so ALL transient
  transport failures get auto-retried, not just the handful we'd
  enumerated. No more whack-a-mole per subclass.
- Add _render_turn_exception(): the REPL now shows a friendly one-liner
  for transient/connection errors and reserves the full traceback for
  genuine bugs. Reuses the retry classifier so the two stay in lock-step.

Tests: regression coverage for ReadError/ConnectError/httpcore.ReadError
in the retry classifier, plus _render_turn_exception friendly-vs-traceback
behavior.
@JulienEllie JulienEllie changed the title fix: never let a dropped connection crash the REPL fix: don't make a recoverable connection drop look like a crash Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant