Problem
api_ref.py creates a requests.Session with no timeout configured:
self._session = requests.Session()
No timeout= is ever passed to any .post(), .get(), .put(), or .delete() call. This means the connect, send, and receive phases all have an infinite timeout.
The indefinite-hang scenario
When uploading a large payload (e.g. a big CSV to plots/:plot/data):
socket.sendall(large_body) starts pushing the body into the kernel TCP send buffer
- A network interruption occurs mid-send — NAT table expiry, middlebox reboot, etc.
- The TCP connection goes half-open: the client kernel believes the connection is alive, but the path to the server is dead
sendall is actively trying to flush queued data — it is not idle
- TCP keepalive does not fire on non-idle connections (keepalive only probes when there is no pending data to send)
sendall blocks forever — no socket timeout, no keepalive rescue, no exception
This has been observed as multi-day hangs in production. The server side is not involved — it sends its response (e.g. a 413 for an oversized payload) and closes the connection in milliseconds. The hang lives entirely in the client thread.
Fix
Set a default timeout on the session in NovemAPI.__init__:
self._session = requests.Session()
# (connect_timeout, read_timeout) — applied to every request
self._session.request = functools.partial(self._session.request, timeout=(10, 120))
The connect timeout guards against TCP blackholes (no RST, packet dropped). The read timeout guards against servers that accept but never respond.
For the sendall/half-open case specifically, the socket-level send timeout needs to be set via a custom HTTPAdapter (urllib3's socket_options), but the session-level timeout above eliminates the most common cases.
Related
- The
write / api_write methods also silently swallow non-404/403 errors (e.g. HTTP 413) instead of raising — a 2 MB upload that is rejected prints to stdout and returns normally, giving the caller no indication the write failed. That is a separate issue but compounds the problem: the caller cannot distinguish a successful write from a silently dropped one.
Problem
api_ref.pycreates arequests.Sessionwith no timeout configured:No
timeout=is ever passed to any.post(),.get(),.put(), or.delete()call. This means the connect, send, and receive phases all have an infinite timeout.The indefinite-hang scenario
When uploading a large payload (e.g. a big CSV to
plots/:plot/data):socket.sendall(large_body)starts pushing the body into the kernel TCP send buffersendallis actively trying to flush queued data — it is not idlesendallblocks forever — no socket timeout, no keepalive rescue, no exceptionThis has been observed as multi-day hangs in production. The server side is not involved — it sends its response (e.g. a 413 for an oversized payload) and closes the connection in milliseconds. The hang lives entirely in the client thread.
Fix
Set a default timeout on the session in
NovemAPI.__init__:The connect timeout guards against TCP blackholes (no RST, packet dropped). The read timeout guards against servers that accept but never respond.
For the
sendall/half-open case specifically, the socket-level send timeout needs to be set via a customHTTPAdapter(urllib3'ssocket_options), but the session-level timeout above eliminates the most common cases.Related
write/api_writemethods also silently swallow non-404/403 errors (e.g. HTTP 413) instead of raising — a 2 MB upload that is rejected prints to stdout and returns normally, giving the caller no indication the write failed. That is a separate issue but compounds the problem: the caller cannot distinguish a successful write from a silently dropped one.