Skip to content

requests.Session has no timeout — large uploads can hang indefinitely on half-open TCP connections #239

@bjornars

Description

@bjornars

Problem

api_ref.py creates a requests.Session with no timeout configured:

self._session = requests.Session()

No timeout= is ever passed to any .post(), .get(), .put(), or .delete() call. This means the connect, send, and receive phases all have an infinite timeout.

The indefinite-hang scenario

When uploading a large payload (e.g. a big CSV to plots/:plot/data):

  1. socket.sendall(large_body) starts pushing the body into the kernel TCP send buffer
  2. A network interruption occurs mid-send — NAT table expiry, middlebox reboot, etc.
  3. The TCP connection goes half-open: the client kernel believes the connection is alive, but the path to the server is dead
  4. sendall is actively trying to flush queued data — it is not idle
  5. TCP keepalive does not fire on non-idle connections (keepalive only probes when there is no pending data to send)
  6. sendall blocks forever — no socket timeout, no keepalive rescue, no exception

This has been observed as multi-day hangs in production. The server side is not involved — it sends its response (e.g. a 413 for an oversized payload) and closes the connection in milliseconds. The hang lives entirely in the client thread.

Fix

Set a default timeout on the session in NovemAPI.__init__:

self._session = requests.Session()
# (connect_timeout, read_timeout) — applied to every request
self._session.request = functools.partial(self._session.request, timeout=(10, 120))

The connect timeout guards against TCP blackholes (no RST, packet dropped). The read timeout guards against servers that accept but never respond.

For the sendall/half-open case specifically, the socket-level send timeout needs to be set via a custom HTTPAdapter (urllib3's socket_options), but the session-level timeout above eliminates the most common cases.

Related

  • The write / api_write methods also silently swallow non-404/403 errors (e.g. HTTP 413) instead of raising — a 2 MB upload that is rejected prints to stdout and returns normally, giving the caller no indication the write failed. That is a separate issue but compounds the problem: the caller cannot distinguish a successful write from a silently dropped one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions