fix(server): report idle sessions as not-running on reconnect#402
fix(server): report idle sessions as not-running on reconnect#402dimakis wants to merge 1 commit into
Conversation
handleReconnect conflated "query loop alive" with "agent actively processing a turn". isActive() returns true for sessions whose query loop is alive but idle between turns (waiting in `for await`). This caused the reconnected message to report running=true, making the frontend queue the first user message behind a 5-second fallback timer instead of sending it immediately — the "two messages to get a response after reattach" bug. Fix: check lastSpeaker from EventStore. When lastSpeaker=assistant the agent completed its last turn and is idle, so report running=false. When lastSpeaker=user the agent hasn't answered yet, so running=true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimakis
left a comment
There was a problem hiding this comment.
Centaur Review
Found 3 issue(s) (1 warning).
server/ws-handler-v2.ts
Core fix is correct and well-tested, but the same idle-session running-state logic is missing from handleSwitchSession, which sends the same running field to the same client queue mechanism — leaving the 5-second delay bug reachable via the switch_session path.
- 🟡 regressions: handleSwitchSession (line 429) computes
runningwithout the new lastSpeaker idle check. Both reconnected and session_switched drive SET_RUNNING in the client store, which controls whether sendMessage queues behind a 5-second fallback timer. An idle session (lastSpeaker=assistant) will be reported as running=false on reconnect but running=true on switch_session, causing the same queuing delay this PR fixes — just via a different code path.[fixable]
server/__tests__/ws-handler-v2.test.ts
Core fix is correct and well-tested, but the same idle-session running-state logic is missing from handleSwitchSession, which sends the same running field to the same client queue mechanism — leaving the 5-second delay bug reachable via the switch_session path.
- 🔵 missing_tests (L1148): The new idle-session tests don't verify that reattachChat is NOT called when running is downgraded to false by the lastSpeaker check. The zombie test (line 1117) explicitly asserts
expect(reattachChat).not.toHaveBeenCalled(), but the new idle tests skip this. Since the code at line 284 gates reattach onrunning, andrunningis now false for idle sessions, it would be good to confirm the reattach is correctly skipped — especially since the session is still alive (unlike zombies where it's removed).[fixable] - 🔵 missing_tests (L1148): No test covers the case where storeMeta exists, isActive is true, but lastSpeaker is null (brand-new session that hasn't had any turns yet). The implementation handles it correctly (null !== 'assistant' → running stays true), but an explicit test would document this edge case and guard against future regressions.
[fixable]
| expect(reattachChat).not.toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| it('reports running=false when session is active but idle (lastSpeaker=assistant)', () => { |
There was a problem hiding this comment.
🔵 missing_tests: The new idle-session tests don't verify that reattachChat is NOT called when running is downgraded to false by the lastSpeaker check. The zombie test (line 1117) explicitly asserts expect(reattachChat).not.toHaveBeenCalled(), but the new idle tests skip this. Since the code at line 284 gates reattach on running, and running is now false for idle sessions, it would be good to confirm the reattach is correctly skipped — especially since the session is still alive (unlike zombies where it's removed). [fixable]
| expect(reattachChat).not.toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| it('reports running=false when session is active but idle (lastSpeaker=assistant)', () => { |
There was a problem hiding this comment.
🔵 missing_tests: No test covers the case where storeMeta exists, isActive is true, but lastSpeaker is null (brand-new session that hasn't had any turns yet). The implementation handles it correctly (null !== 'assistant' → running stays true), but an explicit test would document this edge case and guard against future regressions. [fixable]
Summary
handleReconnectusedisActive()(session exists in registry) to determinerunningstate, but this returnstruefor sessions whose query loop is alive but idle between turns. The frontend interpretsrunning=trueas "agent is busy" and queues messages behind a 5-second fallback timer instead of sending immediately — causing the persistent "two messages needed after reattach" bug.isActive/storeMeta.isActivechecks, also checklastSpeakerfrom EventStore.lastSpeaker=assistantmeans the agent completed its turn (idle) →running=false.lastSpeaker=usermeans the agent hasn't answered yet →running=true.Context
This bug has been the target of 13+ PRs (#320, #348, #353, #360, #362, #367, #368, #371, #372, #381, #384, #386, #392) fixing symptoms at different layers (SSE race conditions, iOS WS stale readyState, idempotency, state machine infra). None addressed the semantic mismatch: "query loop alive" ≠ "agent actively processing." The 5-second
PENDING_SEND_TIMEOUT_MStimer instore.ts:355is literally a band-aid for this — its comment says "stale running state after reconnect."Test plan
🤖 Generated with Claude Code