Skip to content

Stop leaking JetStream consumers on reopen; tighten reconnect detection (2.0.3)#28

Merged
majkelx merged 1 commit into
masterfrom
fix/reopen-consumer-leak-and-faster-reconnect-detection
May 2, 2026
Merged

Stop leaking JetStream consumers on reopen; tighten reconnect detection (2.0.3)#28
majkelx merged 1 commit into
masterfrom
fix/reopen-consumer-leak-and-faster-reconnect-detection

Conversation

@majkelx

@majkelx majkelx commented May 2, 2026

Copy link
Copy Markdown
Contributor

Bug

MsgReader._reopen() calls pull_subscription.unsubscribe() on the old subscription, which detaches the local handle but does not delete the server-side JetStream consumer. Each NATS reconnect therefore leaves an orphan consumer on the stream; one orphan per reopen, accumulating indefinitely until the process restarts (or the stream's consumer cap is hit).

Symptom in the field: monotonically growing `nats consumer ls ` output, with consumer creation timestamps lining up with NATS reconnect events.

The teardown sequence in `_close_pull_subscription()` already shows the correct pattern: `consumer_info() → unsubscribe() → delete_consumer()`. `_reopen()` was just missing the `delete_consumer()` step.

Fix

Two small changes in `serverish/messenger/msg_reader.py`:

  1. _reopen() now deletes the old consumer (mirrors _close_pull_subscription teardown order). Capture `consumer_info()` before unsubscribing so we still know the consumer name; unsubscribe; then `delete_consumer()`. Failures at any step are logged and ignored — the new subscription is created regardless, matching the previous resilience.

  2. Tighten the reconnect-detection window in the non-nowait wait loop. The outer loop only checks `_reconnect_needed` between fetch cycles, so a low-rate subject can sit on a dead pull subscription for up to one full cycle after a NATS reconnect (silent callbacks until detection). Reduce `blocking_interval` from 10s to 2s; bump `max_wait_cycles` from 10 to 50 so the total 100s wait window is unchanged.

Compat

  • Pull-consumer behaviour is unchanged for all callers; only the orphan-accumulation side-effect goes away.
  • The 2s/50-cycle change keeps total wait identical (100s). High-rate subjects are unaffected (they fetch faster than 2s anyway). Low-rate subjects observe up to ~8s faster recovery after reconnect.

Test plan

  • Run a long-lived subscriber, kill+restart NATS several times, then `nats consumer ls `. With 2.0.2, count grows by N reconnects; with 2.0.3, count stays flat.
  • Confirm a low-rate subscriber (≪1 msg/min) starts receiving messages within a few seconds of NATS reconnect, not tens of seconds later.

…connect detection

`MsgReader._reopen()` only called `unsubscribe()` on the old subscription,
which detaches the local handle but does NOT delete the server-side
JetStream consumer. Each NATS reconnect therefore left an orphan consumer
on the stream; one orphan accumulates per reopen, indefinitely, until the
process restarts (or the stream's consumer cap is hit). Symptom in the
field: monotonically growing `nats consumer ls <stream>` output with
creation timestamps lining up with NATS reconnect events.

Fix mirrors the teardown order already used by `_close_pull_subscription`:
capture `consumer_info()` before unsubscribing (so we still know the
consumer name), unsubscribe, then `delete_consumer()`. Failures at any
step are logged and ignored - the new subscription is created regardless,
matching the previous resilience behaviour.

Also reduces the in-fetch blocking_interval (10s → 2s) inside the
non-nowait wait loop. The outer loop only checks `_reconnect_needed`
between fetch cycles, so a low-rate subject can sit on a dead pull
subscription for up to one cycle after a reconnect. Reduces the
worst-case detection lag from ~10s to ~2s; total wait window is
unchanged at 100s (50 cycles × 2s).
@majkelx majkelx merged commit d7da3a8 into master May 2, 2026
2 checks passed
@majkelx majkelx deleted the fix/reopen-consumer-leak-and-faster-reconnect-detection branch May 2, 2026 00:40
@majkelx majkelx restored the fix/reopen-consumer-leak-and-faster-reconnect-detection branch May 2, 2026 00:42
@majkelx majkelx deleted the fix/reopen-consumer-leak-and-faster-reconnect-detection branch May 2, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant