fix(transport): prevent SSE replay storms and silent message loss#396

Open

dimakis wants to merge 9 commits into

mainfrom

fix/sse-delivery-regression

dimakis commented Jun 25, 2026

Owner

Summary

Fixes the SSE transport regression causing 8+ minute message delivery delays and silent message drops after the SSE migration (#381).

Root cause chain:

ProcessTransport.interrupt() throws unhandled → server crashes → loses all in-memory state
Clients reconnect with stale cursors → periodic sync replays thousands of events at 50/5s
isSessionActive only checked a boolean, not session STATE → SUSPENDED/DETACHED sessions replayed indefinitely
Failed send POSTs (404 from stale connectionId) were silently dropped → user messages lost

Three fixes:

Periodic sync replay storm prevention (connection-registry.ts):
- Replace isSessionActive with shouldSync — checks session STATE (only ACTIVE/STARTING/CREATED sync)
- Add MAX_REPLAY_GAP (200 events) — skip ahead when cursor is far behind head instead of replaying 6K+ events
- Add getHeadSeq to EventStore for gap detection
POST retry for failed message sends (sse-connection.ts):
- Re-queue send POSTs that get HTTP errors or network errors
- Messages flush on next successful reconnect instead of being silently dropped
Guard ProcessTransport.interrupt() crash (chat.ts):
- Wrap queryInstance.interrupt() in try/catch to prevent server process death

Test plan

76 tests passing (connection-registry + sse-connection)
Type check clean
Centaur review (2 rounds)
Manual test: send messages during agent tool execution, verify delivery
Manual test: restart server, verify no replay storm in logs

🤖 Generated with Claude Code


          fix(transport): prevent SSE replay storms and silent message loss

ebe2388

Three fixes for the SSE transport regression causing 8+ minute message
delivery delays and silent message drops:

1. Periodic sync replay storm prevention (connection-registry):
   - Replace isSessionActive with shouldSync that checks session STATE
     (ACTIVE/STARTING/CREATED only), not just the isActive boolean.
     SUSPENDED/DETACHED/ENDED sessions no longer trigger sync.
   - Add MAX_REPLAY_GAP (200 events) cap — when cursor is far behind
     head (e.g. after server restart), skip ahead instead of replaying
     thousands of events at 50/5s. Client lazy-loads history via API.
   - Add getHeadSeq to EventStore for gap detection.

2. POST retry for failed message sends (sse-connection):
   - Re-queue send POSTs that get HTTP errors (e.g. 404 from stale
     connectionId after reconnect) or network errors. Messages flush
     on next successful reconnect instead of being silently dropped.
   - Only send endpoints retry — stop/interrupt are fire-and-forget.

3. Guard ProcessTransport.interrupt() crash (chat.ts):
   - Wrap queryInstance.interrupt() in try/catch to prevent unhandled
     errors from killing the server process. This was the trigger that
     caused server restart → in-memory state loss → replay storms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 5 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid fix for a real production issue — replay storm prevention and retry-on-failure are well-designed. Main concern is the missing MAX_PENDING_SENDS cap on the re-queue path in doPost, and EventStore.getHeadSeq lacks direct unit tests against SQLite.

🟡 bugs (L340): Re-queue in doPost bypasses the MAX_PENDING_SENDS cap enforced in send() (line 97). If multiple flushPendingSends rounds fail repeatedly, pendingSends can grow beyond 100 without the shift() eviction. Consider applying the same cap here. [fixable]

`packages/protocol/src/event-store.ts`

Solid fix for a real production issue — replay storm prevention and retry-on-failure are well-designed. Main concern is the missing MAX_PENDING_SENDS cap on the re-queue path in doPost, and EventStore.getHeadSeq lacks direct unit tests against SQLite.

🟡 missing_tests (L617): EventStore.getHeadSeq() has no unit tests. It's exercised indirectly via connection-registry tests (which mock it), but the actual SQLite query (MAX(seq)) is never tested against a real database — e.g., no events returns 0, single event returns its seq, multiple events returns the max. [fixable]

`server/index.ts`

Solid fix for a real production issue — replay storm prevention and retry-on-failure are well-designed. Main concern is the missing MAX_PENDING_SENDS cap on the re-queue path in doPost, and EventStore.getHeadSeq lacks direct unit tests against SQLite.

🔵 unsafe_assumptions (L105): shouldSync returns false for null state (unknown session ID), which is correct but implicit. If a session is watched but hasn't been registered in the EventStore yet (race at startup), periodic sync silently skips it. This is probably fine since events wouldn't exist yet either, but worth noting.

`server/chat.ts`

Solid fix for a real production issue — replay storm prevention and retry-on-failure are well-designed. Main concern is the missing MAX_PENDING_SENDS cap on the re-queue path in doPost, and EventStore.getHeadSeq lacks direct unit tests against SQLite.

🔵 missing_tests (L1269): The new try/catch around queryInstance.interrupt() has no test verifying that the server survives a ProcessTransport crash. The existing interruptChat tests in send-to-chat.test.ts don't cover the case where interrupt() throws. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid fix for a real production issue — replay storm prevention and retry-on-failure are well-designed. Main concern is the missing MAX_PENDING_SENDS cap on the re-queue path in doPost, and EventStore.getHeadSeq lacks direct unit tests against SQLite.

🔵 style (L876): The callCount tracking variable in the first two retry tests shadows mockFetch.mock.calls.length. Using mockFetch.mockImplementationOnce() for the first call and a default for subsequent calls would be cleaner and avoid the manual counter. [fixable]

packages/client/src/sse-connection.ts Outdated

+                      // Message delivery failed (e.g. 404 from stale connectionId).
+                      // Re-queue so it flushes on the next successful reconnect.
+                      console.warn(`[SseConnection] POST /${endpoint} returned ${res.status}, re-queuing`);
+                      this.pendingSends.push({ endpoint, body });

dimakis Jun 25, 2026

Owner Author

🟡 bugs: Re-queue in doPost bypasses the MAX_PENDING_SENDS cap enforced in send() (line 97). If multiple flushPendingSends rounds fail repeatedly, pendingSends can grow beyond 100 without the shift() eviction. Consider applying the same cap here. [fixable]

packages/protocol/src/event-store.ts

                   return (row?.state as SessionState) ?? null;
                 }
+                /** Return the highest seq number for a session, or 0 if no events exist. */

dimakis Jun 25, 2026

Owner Author

🟡 missing_tests: EventStore.getHeadSeq() has no unit tests. It's exercised indirectly via connection-registry tests (which mock it), but the actual SQLite query (MAX(seq)) is never tested against a real database — e.g., no events returns 0, single event returns its seq, multiple events returns the max. [fixable]

server/index.ts

+              // shouldSync skips ENDED/SUSPENDED/DETACHED sessions to prevent replay storms.
+              // getHeadSeq caps replay depth so post-crash reconnects don't flood clients.
               connRegistry.setEventStore({
                 getEventsAfter: (sessionId, afterSeq, limit) =>

dimakis Jun 25, 2026

Owner Author

🔵 unsafe_assumptions: shouldSync returns false for null state (unknown session ID), which is correct but implicit. If a session is watched but hasn't been registered in the EventStore yet (race at startup), periodic sync silently skips it. This is probably fine since events wouldn't exist yet either, but worth noting.

server/chat.ts

                     await Promise.allSettled(stops);
                   }
-                  await session.queryInstance.interrupt();
+                  try {

dimakis Jun 25, 2026

Owner Author

🔵 missing_tests: The new try/catch around queryInstance.interrupt() has no test verifying that the server survives a ProcessTransport crash. The existing interruptChat tests in send-to-chat.test.ts don't cover the case where interrupt() throws. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

                   expect(mockFetch).not.toHaveBeenCalled();
                 });
+                // ─── POST retry on failure ───────────────────────────────────────────────

dimakis Jun 25, 2026

Owner Author

🔵 style: The callCount tracking variable in the first two retry tests shadows mockFetch.mock.calls.length. Using mockFetch.mockImplementationOnce() for the first call and a default for subsequent calls would be cleaner and avoid the manual counter. [fixable]


          fix: address Centaur review findings on SSE delivery PR

76608d5

- Add MAX_PENDING_SENDS cap to re-queue path in doPost via enqueuePending()
  helper, preventing unbounded queue growth on repeated flush failures
- Add EventStore.getHeadSeq unit tests against real SQLite (0 events, single
  event, max of multiple events)
- Add interrupt() crash survival test — verifies ProcessTransport error
  doesn't propagate and message still reaches inputQueue
- Use mockImplementationOnce in retry tests instead of manual call counters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 5 issue(s) (1 warning).

`packages/client/src/sse-connection.ts`

Solid defensive fix addressing real crash and replay storm scenarios. The core logic (try/catch on interrupt, replay depth cap, POST retry for sends) is correct and well-tested. Main gap: no client-visible signal when events are skipped by MAX_REPLAY_GAP, and the POST retry doesn't distinguish transient vs permanent HTTP errors.

🔵 style (L97): The send() method (lines 97-100) duplicates the queue-with-cap logic that the new enqueuePending() helper (lines 362-366) encapsulates. Consider calling this.enqueuePending(endpoint, msg) here to eliminate the duplication. [fixable]
🔵 bugs (L336): The re-queue triggers on any non-ok response for send POSTs, including 4xx client errors like 400 (bad request) or 422 (validation error). These are not transient and will fail identically on retry, wasting a queue slot each reconnect cycle. Consider only re-queuing on 5xx or network errors, not client errors that indicate a permanent problem. [fixable]

`packages/harness/src/connection-registry.ts`

Solid defensive fix addressing real crash and replay storm scenarios. The core logic (try/catch on interrupt, replay depth cap, POST retry for sends) is correct and well-tested. Main gap: no client-visible signal when events are skipped by MAX_REPLAY_GAP, and the POST retry doesn't distinguish transient vs permanent HTTP errors.

🟡 unsafe_assumptions (L242): When the cursor is skipped ahead via MAX_REPLAY_GAP, the client silently loses events with no notification. The comment references /api/sessions/:id/events for lazy-loading, but no gap-detection signal (e.g. a synthetic events_skipped message with the skipped range) is sent to the client. The client has no way to know events were dropped unless it independently tracks sequence continuity. Consider emitting a gap marker so the client can request backfill. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive fix addressing real crash and replay storm scenarios. The core logic (try/catch on interrupt, replay depth cap, POST retry for sends) is correct and well-tested. Main gap: no client-visible signal when events are skipped by MAX_REPLAY_GAP, and the POST retry doesn't distinguish transient vs permanent HTTP errors.

🔵 missing_tests: No test covers the scenario where a re-queued send fails again on the next flush (repeated failure across multiple reconnects). While the snapshot-based flush prevents infinite loops within a single flush, it would be good to verify that a message survives multiple reconnect cycles until delivery succeeds. [fixable]

`server/index.ts`

Solid defensive fix addressing real crash and replay storm scenarios. The core logic (try/catch on interrupt, replay depth cap, POST retry for sends) is correct and well-tested. Main gap: no client-visible signal when events are skipped by MAX_REPLAY_GAP, and the POST retry doesn't distinguish transient vs permanent HTTP errors.

🔵 unsafe_assumptions (L108): getSessionState() returns null for sessions not yet in the EventStore. The shouldSync callback correctly returns false in that case (since null !== 'ACTIVE'), which means a watched session that hasn't been persisted yet will be skipped by periodic sync. This is likely fine since events can't exist before the session row, but the implicit null→false mapping deserves a comment for clarity. [fixable]

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 4 issue(s) (1 warning).

`packages/client/src/sse-connection.ts`

Solid fix for SSE replay storms and message loss. The core mechanisms (replay gap capping, send re-queuing, interrupt crash guard) are correct and well-tested. Main concern: doPost re-queues on all HTTP errors including non-retryable 4xx, which could cause unnecessary retry churn.

🟡 unsafe_assumptions (L336): !res.ok re-queues on ALL non-2xx responses, including non-transient errors like 400 (invalid body from Zod validation). A permanently invalid message will cycle through the queue on every reconnect since it gets re-queued after each failure. Consider restricting retries to transient errors: (res.status === 404 || res.status >= 500). [fixable]
🔵 style (L97): The send() method (lines 97-100) has inline queue logic (shift() + push()) that duplicates the new enqueuePending() method. Consider calling this.enqueuePending(endpoint, msg) here to keep the cap logic in one place. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid fix for SSE replay storms and message loss. The core mechanisms (replay gap capping, send re-queuing, interrupt crash guard) are correct and well-tested. Main concern: doPost re-queues on all HTTP errors including non-retryable 4xx, which could cause unnecessary retry churn.

🔵 missing_tests: No test for the edge case where a re-queued send fails again on the next flush (verifying it doesn't cause an infinite retry loop within one flush cycle). The atomic capture in flushPendingSends prevents this, but a test would document the invariant. [fixable]

`packages/harness/src/connection-registry.ts`

Solid fix for SSE replay storms and message loss. The core mechanisms (replay gap capping, send re-queuing, interrupt crash guard) are correct and well-tested. Main concern: doPost re-queues on all HTTP errors including non-retryable 4xx, which could cause unnecessary retry churn.

🔵 regressions (L228): The isSessionActive → shouldSync rename in EventStoreAdapter is a breaking change for any external consumers of this interface. Confirmed no references remain in the codebase, but if any downstream code implements EventStoreAdapter outside this repo, it will silently stop filtering sessions (since shouldSync is optional). Low risk given this is an internal package.

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 6 issue(s) (3 warning).

`packages/client/src/sse-connection.ts`

Solid defensive fix addressing real failure modes (replay storms, interrupt crashes, silent message loss). The main concern is that client-side send re-queuing doesn't distinguish transient from permanent POST failures and lacks an immediate retry path when the SSE stream stays connected. The MAX_REPLAY_GAP references client lazy-loading that doesn't exist yet.

🟡 bugs (L336): Re-queues on ALL non-OK statuses including permanent errors (400, 422). A malformed request will be retried on every reconnect indefinitely, evicted only when the queue fills to MAX_PENDING_SENDS. Consider excluding 4xx client errors (except 404/408/429) from re-queue. [fixable]
🟡 bugs (L340): When doPost fails for a 'send' during normal connected operation (not during reconnect), the message is re-queued to pendingSends but only flushed on the next SSE reconnect. If the POST failure was transient (e.g. 500) but the SSE stream stays alive, the re-queued message sits in the queue indefinitely. Consider triggering a reconnect cycle when a send POST fails, or implementing an immediate retry with backoff. [fixable]
🔵 style (L339): The console.warn interpolation POST /${endpoint} will always be POST /send since the re-queue is guarded by endpoint === 'send'. The template is misleadingly generic. [fixable]

`packages/harness/src/connection-registry.ts`

Solid defensive fix addressing real failure modes (replay storms, interrupt crashes, silent message loss). The main concern is that client-side send re-queuing doesn't distinguish transient from permanent POST failures and lacks an immediate retry path when the SSE stream stays connected. The MAX_REPLAY_GAP references client lazy-loading that doesn't exist yet.

🟡 unsafe_assumptions (L55): MAX_REPLAY_GAP comment says 'Client can lazy-load older events via the /api/sessions/:id/events API', but the @mitzo/client package has no lazy-loading implementation. Events skipped by the gap cap are silently lost to the client. Either implement client-side lazy-loading or document this as a known trade-off.

`server/index.ts`

Solid defensive fix addressing real failure modes (replay storms, interrupt crashes, silent message loss). The main concern is that client-side send re-queuing doesn't distinguish transient from permanent POST failures and lacks an immediate retry path when the SSE stream stays connected. The MAX_REPLAY_GAP references client lazy-loading that doesn't exist yet.

🔵 missing_tests (L108): shouldSync returns false when getSessionState returns null (unknown session). This is correct behavior but the edge case is untested — the connection-registry tests mock shouldSync directly rather than exercising the actual null-state path wired in index.ts. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive fix addressing real failure modes (replay storms, interrupt crashes, silent message loss). The main concern is that client-side send re-queuing doesn't distinguish transient from permanent POST failures and lacks an immediate retry path when the SSE stream stays connected. The MAX_REPLAY_GAP references client lazy-loading that doesn't exist yet.

🔵 missing_tests (L875): No test for the scenario where doPost fails during flushPendingSends (reconnect flush). If a re-queued send fails again during flush, it should be re-queued once more. This interaction between flushPendingSends and enqueuePending is the most complex path and deserves coverage. [fixable]

packages/client/src/sse-connection.ts Outdated

                       },
                       body: JSON.stringify(body),
                     });
+                    if (!res.ok && endpoint === 'send') {

dimakis Jun 25, 2026

Owner Author

🟡 bugs: Re-queues on ALL non-OK statuses including permanent errors (400, 422). A malformed request will be retried on every reconnect indefinitely, evicted only when the queue fills to MAX_PENDING_SENDS. Consider excluding 4xx client errors (except 404/408/429) from re-queue. [fixable]

packages/client/src/sse-connection.ts Outdated

+                      // Message delivery failed (e.g. 404 from stale connectionId).
+                      // Re-queue so it flushes on the next successful reconnect.
+                      console.warn(`[SseConnection] POST /${endpoint} returned ${res.status}, re-queuing`);
+                      this.enqueuePending(endpoint, body);

dimakis Jun 25, 2026

Owner Author

🟡 bugs: When doPost fails for a 'send' during normal connected operation (not during reconnect), the message is re-queued to pendingSends but only flushed on the next SSE reconnect. If the POST failure was transient (e.g. 500) but the SSE stream stays alive, the re-queued message sits in the queue indefinitely. Consider triggering a reconnect cycle when a send POST fails, or implementing an immediate retry with backoff. [fixable]

packages/client/src/sse-connection.ts

+                    if (!res.ok && endpoint === 'send') {
+                      // Message delivery failed (e.g. 404 from stale connectionId).
+                      // Re-queue so it flushes on the next successful reconnect.
+                      console.warn(`[SseConnection] POST /${endpoint} returned ${res.status}, re-queuing`);

dimakis Jun 25, 2026

Owner Author

🔵 style: The console.warn interpolation POST /${endpoint} will always be POST /send since the re-queue is guarded by endpoint === 'send'. The template is misleadingly generic. [fixable]

packages/harness/src/connection-registry.ts

               const SYNC_BATCH_LIMIT = 50;
+              // Maximum gap between cursor and head before we skip ahead.
+              // Prevents replay storms after server restart or long disconnects.
+              // Client can lazy-load older events via the /api/sessions/:id/events API.

dimakis Jun 25, 2026

Owner Author

🟡 unsafe_assumptions: MAX_REPLAY_GAP comment says 'Client can lazy-load older events via the /api/sessions/:id/events API', but the @mitzo/client package has no lazy-loading implementation. Events skipped by the gap cap are silently lost to the client. Either implement client-side lazy-loading or document this as a known trade-off.

server/index.ts

                   eventStore.getEventsAfter(sessionId, afterSeq, limit),
-                isSessionActive: (sessionId) => eventStore.getSession(sessionId)?.isActive ?? false,
+                shouldSync: (sessionId) => {
+                  const state = eventStore.getSessionState(sessionId);

dimakis Jun 25, 2026

Owner Author

🔵 missing_tests: shouldSync returns false when getSessionState returns null (unknown session). This is correct behavior but the edge case is untested — the connection-registry tests mock shouldSync directly rather than exercising the actual null-state path wired in index.ts. [fixable]

packages/client/src/__tests__/sse-connection.test.ts


		expect(mockFetch).not.toHaveBeenCalled();
		});

dimakis Jun 25, 2026

Owner Author

🔵 missing_tests: No test for the scenario where doPost fails during flushPendingSends (reconnect flush). If a re-queued send fails again during flush, it should be re-queued once more. This interaction between flushPendingSends and enqueuePending is the most complex path and deserves coverage. [fixable]


          fix: address second Centaur review — transient-only retries, dedup en…

da1e1fc

…queuePending

- Restrict POST retries to transient errors (404 + 5xx), skip 400/4xx
  to avoid re-queuing permanently invalid messages
- Use enqueuePending() in send() to consolidate cap logic in one place
- Add tests: 400 not retried, re-queued sends don't infinite loop in
  one flush cycle (atomic capture invariant)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 5 issue(s) (1 warning).

`packages/client/src/sse-connection.ts`

Solid reliability PR — the three fixes (POST retry, replay-cap, interrupt guard) are well-tested and correctly scoped. Main concern is re-queued sends lacking a retry budget, plus CLOSING sessions being excluded from periodic sync despite still producing events.

🔵 style (L97): send() (lines 97-100) duplicates the queue-cap logic that the new enqueuePending() method was introduced to encapsulate. Replace the inline shift()/push() in send() with this.enqueuePending(endpoint, msg) to avoid drift between the two copies. [fixable]
🔵 unsafe_assumptions (L337): Re-queued sends have no retry budget — a message the server permanently rejects (e.g. expired session, malformed payload) will cycle through doPost → enqueuePending → flushPendingSends → doPost on every reconnect indefinitely. Consider adding a retryCount or TTL to the pending entry and dropping it after a reasonable limit. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid reliability PR — the three fixes (POST retry, replay-cap, interrupt guard) are well-tested and correctly scoped. Main concern is re-queued sends lacking a retry budget, plus CLOSING sessions being excluded from periodic sync despite still producing events.

🔵 missing_tests: No test verifies what happens when a re-queued send fails again during flush (the second doPost also returns an error). Worth a test to document the infinite-retry behavior or to validate a future retry-count cap. [fixable]

`server/index.ts`

Solid reliability PR — the three fixes (POST retry, replay-cap, interrupt guard) are well-tested and correctly scoped. Main concern is re-queued sends lacking a retry budget, plus CLOSING sessions being excluded from periodic sync despite still producing events.

🟡 regressions (L110): shouldSync excludes CLOSING sessions, but CLOSING means the agent is still running (generating its closeout summary/commit). If an observer's transport briefly drops during that window, periodic sync won't retry those closing events. Consider including state === 'CLOSING' in the truthy return, since CLOSING is short-lived and event-producing. [fixable]

`packages/harness/tests/connection-registry.test.ts`

Solid reliability PR — the three fixes (POST retry, replay-cap, interrupt guard) are well-tested and correctly scoped. Main concern is re-queued sends lacking a retry budget, plus CLOSING sessions being excluded from periodic sync despite still producing events.

🔵 missing_tests: The replay-cap tests cover the getHeadSeq path, but there is no test that verifies backward compatibility when getHeadSeq is absent from the adapter (unlike the existing test for missing shouldSync). Adding a test where getHeadSeq is undefined and the cursor is far behind head would confirm no regression for callers that don't implement it. [fixable]


          fix: address third Centaur review — retry budget, CLOSING sync, compa…

8a90958

…t test

- Add MAX_SEND_RETRIES (3) budget to pending sends — messages dropped
  after 3 failed attempts instead of cycling indefinitely
- Include CLOSING state in shouldSync — agent still produces events
  during closeout (summary, commits), observers need delivery retries
- Add backward compat test for missing getHeadSeq in EventStoreAdapter
- Update comment to reflect CLOSING inclusion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 2 issue(s).

`packages/client/src/sse-connection.ts`

Clean, well-tested PR. The retry/re-queue mechanism is correctly bounded (atomic capture prevents infinite loops, MAX_PENDING_SENDS caps the queue, only transient errors on 'send' endpoints are retried). The replay-gap cap, shouldSync filter, and interruptChat crash guard are all correctly implemented with comprehensive tests. Two minor suggestions on retry completeness and message ordering.

🔵 unsafe_assumptions (L333): 429 (Too Many Requests) is a standard transient HTTP error but is not retried here — only 404 and >= 500 trigger re-queue. Unlikely for a self-hosted server, but if the backend ever sits behind a rate-limiting proxy (nginx, Cloudflare), user sends would be silently dropped. Consider res.status === 404 || res.status === 429 || res.status >= 500, or at minimum document the omission as deliberate. [fixable]
🔵 style (L354): flushPendingSends fires doPost calls without await, so multiple re-queued messages flush concurrently with no ordering guarantee. Pre-existing behavior, but now that failed sends are re-queued and retried, message reordering after transient errors becomes a realistic scenario (e.g., messages A,B,C fail, get re-queued, then flush concurrently on reconnect). If message ordering matters for the chat protocol, consider sequential awaiting or adding a sequence field the server can use to reorder. [fixable]

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 3 issue(s).

`packages/harness/tests/connection-registry.test.ts`

Well-structured fix addressing real failure modes (replay storms, silent message loss, ProcessTransport crashes). The retry logic is correctly bounded with atomic queue capture, the MAX_REPLAY_GAP cap is sound, and test coverage is thorough. Three minor suggestions, no bugs or critical issues.

🔵 missing_tests (L535): The 'caps replay depth' test is missing registry.stopPeriodicSync() at the end, unlike the adjacent 'does not cap replay' test. Not a functional bug (beforeEach creates a fresh registry and afterEach restores real timers), but inconsistent cleanup within the same describe block. [fixable]

`server/index.ts`

Well-structured fix addressing real failure modes (replay storms, silent message loss, ProcessTransport crashes). The retry logic is correctly bounded with atomic queue capture, the MAX_REPLAY_GAP cap is sound, and test coverage is thorough. Three minor suggestions, no bugs or critical issues.

🔵 regressions (L108): shouldSync excludes DETACHED sessions from periodic sync, whereas the old isSessionActive (which checked the is_active boolean column) would have included them — DETACHED sessions have is_active = true. This is likely intentional to prevent replay storms, but it means a secondary watcher (e.g. dashboard) could miss events from a DETACHED session if the primary broadcast failed. Worth documenting the tradeoff.

`packages/client/src/sse-connection.ts`

Well-structured fix addressing real failure modes (replay storms, silent message loss, ProcessTransport crashes). The retry logic is correctly bounded with atomic queue capture, the MAX_REPLAY_GAP cap is sound, and test coverage is thorough. Three minor suggestions, no bugs or critical issues.

🔵 unsafe_assumptions (L333): HTTP 429 (Too Many Requests) is a transient error but won't trigger re-queue — only 404 and >=500 do. If the server ever rate-limits, user messages would be silently dropped. Consider res.status === 404 || res.status === 429 || res.status >= 500. [fixable]

packages/harness/__tests__/connection-registry.test.ts

+                    // Should have fetched from 9800 (head - 200), not from 0
+                    const calls = (store.getEventsAfter as ReturnType<typeof vi.fn>).mock.calls;
+                    expect(calls[0][1]).toBe(9800); // afterSeq should be head - MAX_REPLAY_GAP
+                  });

dimakis Jun 25, 2026

Owner Author

🔵 missing_tests: The 'caps replay depth' test is missing registry.stopPeriodicSync() at the end, unlike the adjacent 'does not cap replay' test. Not a functional bug (beforeEach creates a fresh registry and afterEach restores real timers), but inconsistent cleanup within the same describe block. [fixable]

server/index.ts

                   eventStore.getEventsAfter(sessionId, afterSeq, limit),
-                isSessionActive: (sessionId) => eventStore.getSession(sessionId)?.isActive ?? false,
+                shouldSync: (sessionId) => {
+                  const state = eventStore.getSessionState(sessionId);

dimakis Jun 25, 2026

Owner Author

🔵 regressions: shouldSync excludes DETACHED sessions from periodic sync, whereas the old isSessionActive (which checked the is_active boolean column) would have included them — DETACHED sessions have is_active = true. This is likely intentional to prevent replay storms, but it means a secondary watcher (e.g. dashboard) could miss events from a DETACHED session if the primary broadcast failed. Worth documenting the tradeoff.

packages/client/src/sse-connection.ts Outdated

                       },
                       body: JSON.stringify(body),
                     });
+                    if (!res.ok && endpoint === 'send' && (res.status === 404 || res.status >= 500)) {

dimakis Jun 25, 2026

Owner Author

🔵 unsafe_assumptions: HTTP 429 (Too Many Requests) is a transient error but won't trigger re-queue — only 404 and >=500 do. If the server ever rate-limits, user messages would be silently dropped. Consider res.status === 404 || res.status === 429 || res.status >= 500. [fixable]

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 2 issue(s) (1 warning).

`packages/client/src/tests/sse-connection.test.ts`

Well-structured fix with strong test coverage for retry semantics and replay storm prevention; only missing a test for the MAX_SEND_RETRIES exhaustion path.

🟡 missing_tests: No test for MAX_SEND_RETRIES budget exhaustion. The enqueuePending method drops messages after 3 retries with a console.warn, but no test validates this path. A test should send a message, simulate 3 consecutive reconnect+flush cycles with persistent 404s, and assert the message is no longer in the pending queue on the 4th cycle. Without this, the constant could be changed or the guard removed without any test failing. [fixable]

`packages/harness/tests/connection-registry.test.ts`

Well-structured fix with strong test coverage for retry semantics and replay storm prevention; only missing a test for the MAX_SEND_RETRIES exhaustion path.

🔵 style (L535): Missing registry.stopPeriodicSync() call at the end of 'caps replay depth when cursor is far behind head' test. The sibling tests ('does not cap replay...', 'replays from cursor...') all call stopPeriodicSync. Not a correctness issue since afterEach calls vi.useRealTimers() which discards fake timers, but inconsistent with the pattern in adjacent tests. [fixable]

packages/harness/__tests__/connection-registry.test.ts

+                    // Should have fetched from 9800 (head - 200), not from 0
+                    const calls = (store.getEventsAfter as ReturnType<typeof vi.fn>).mock.calls;
+                    expect(calls[0][1]).toBe(9800); // afterSeq should be head - MAX_REPLAY_GAP
+                  });

dimakis Jun 25, 2026

Owner Author

🔵 style: Missing registry.stopPeriodicSync() call at the end of 'caps replay depth when cursor is far behind head' test. The sibling tests ('does not cap replay...', 'replays from cursor...') all call stopPeriodicSync. Not a correctness issue since afterEach calls vi.useRealTimers() which discards fake timers, but inconsistent with the pattern in adjacent tests. [fixable]


          fix: add 429 retry, sequential flush ordering for SSE sends

197b23f

Address fourth Centaur review: retry 429 (rate limited) as transient,
flush re-queued sends sequentially to preserve message ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 2 issue(s) (1 warning).

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive PR — retry budgets, replay gap capping, and interrupt crash guards are correctly implemented with good test coverage. The only meaningful gap is a missing test for the MAX_SEND_RETRIES drop path.

🟡 missing_tests: No test exercises the MAX_SEND_RETRIES (3) drop behavior. The constant and the if (retries >= MAX_SEND_RETRIES) guard in enqueuePending have zero coverage. A test should send a message, force it to fail across 3 reconnect cycles, and verify it is dropped on the 4th flush attempt (with the console.warn). Without this, a future change to the constant or the >= check could silently regress. [fixable]

`packages/harness/tests/connection-registry.test.ts`

Solid defensive PR — retry budgets, replay gap capping, and interrupt crash guards are correctly implemented with good test coverage. The only meaningful gap is a missing test for the MAX_SEND_RETRIES drop path.

🔵 style (L535): The 'caps replay depth when cursor is far behind head' test does not call registry.stopPeriodicSync() at the end, unlike its sibling tests ('does not cap replay…' at line 556, 'replays from cursor…' at line 584). Inconsistent cleanup — harmless under fake timers but could leak intervals if test isolation changes. [fixable]

packages/harness/__tests__/connection-registry.test.ts

+                    // Should have fetched from 9800 (head - 200), not from 0
+                    const calls = (store.getEventsAfter as ReturnType<typeof vi.fn>).mock.calls;
+                    expect(calls[0][1]).toBe(9800); // afterSeq should be head - MAX_REPLAY_GAP
+                  });

dimakis Jun 25, 2026

Owner Author

🔵 style: The 'caps replay depth when cursor is far behind head' test does not call registry.stopPeriodicSync() at the end, unlike its sibling tests ('does not cap replay…' at line 556, 'replays from cursor…' at line 584). Inconsistent cleanup — harmless under fake timers but could leak intervals if test isolation changes. [fixable]

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 4 issue(s) (1 warning).

`packages/client/src/sse-connection.ts`

Solid fix for SSE delivery reliability. The main concern is flushPendingSends() being fire-and-forget after its async conversion — the _open event fires before queued messages finish flushing, creating a message ordering race in the reconnect path. The retry logic, replay capping, and interruptChat hardening are well-designed and thoroughly tested.

🟡 bugs (L303): flushPendingSends() is now async (sequential await doPost() loop), but both call sites (line 233 and 303) fire-and-forget it. In doReconnectPost (async function), this.listener?.({ type: '_open' }) fires immediately after the unawaited flush, allowing the app to call send() while queued messages are still being flushed — violating the message ordering this PR intends to guarantee. Adding await before this.flushPendingSends() at line 303 would fix the race in the reconnect path. [fixable]
🔵 unsafe_assumptions (L346): Retry logic is gated on endpoint === 'send', so a failed permission_response POST is silently dropped. If the server is briefly unreachable when the user approves a permission, the agent stays blocked indefinitely with no recovery path. Consider extending transient-error retry to permission_response (endpoint 'permission') as well — it has the same user-visible impact as a lost send. [fixable]

`packages/harness/src/connection-registry.ts`

Solid fix for SSE delivery reliability. The main concern is flushPendingSends() being fire-and-forget after its async conversion — the _open event fires before queued messages finish flushing, creating a message ordering race in the reconnect path. The retry logic, replay capping, and interruptChat hardening are well-designed and thoroughly tested.

🔵 unsafe_assumptions (L242): When MAX_REPLAY_GAP capping triggers, the server logs a warning but the client receives no indication that events were skipped. The client may render an incomplete event stream (e.g. tool output without the preceding tool_use). The comment references /api/sessions/:id/events for backfill, but nothing triggers it. Consider sending a synthetic gap event to the transport so the client can request a full restore. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid fix for SSE delivery reliability. The main concern is flushPendingSends() being fire-and-forget after its async conversion — the _open event fires before queued messages finish flushing, creating a message ordering race in the reconnect path. The retry logic, replay capping, and interruptChat hardening are well-designed and thoroughly tested.

🔵 style (L886): All seven new retry tests follow the same pattern: spy console.warn, create mockFetch, connect, emit welcome, send, reconnect, assert. Consider extracting a helper (e.g. setupRetryTest(fetchImpl)) to reduce the ~30 lines of boilerplate per test to ~5, improving readability and making it easier to add future retry scenarios. [fixable]


          fix: await flush before _open, retry permission POSTs, add 429

a23e4a7

Address fifth Centaur review: await flushPendingSends before firing
_open to prevent ordering race, extend transient retry to permission
endpoint (lost permission_response blocks agent indefinitely).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 5 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid defensive fix — the retry/replay-cap logic is correct and well-tested, but the async flushPendingSends is called fire-and-forget in two places, which undermines the sequential ordering guarantee under rapid reconnects. The MAX_SEND_RETRIES drop path lacks test coverage.

🟡 bugs (L233): flushPendingSends() is now async but is called without await at line 233 (welcome handler, non-reconnect path) and line 303 (doReconnectPost success). The atomic capture (line 367-368) prevents data loss, but if a rapid reconnect fires a second welcome before the first flush completes, both flushes run concurrently — potentially sending messages out of order, violating the sequential ordering guarantee this PR intends to provide. Consider adding a flushing guard or awaiting the result. [fixable]
🔵 unsafe_assumptions (L98): When send() calls doPost() directly (connected path, line 98) and the POST fails transiently, the message is re-queued into pendingSends. But nothing triggers flushPendingSends() until the next reconnect/welcome. If the connection remains open (no reconnect), the re-queued message is stranded. This is a latent delivery gap — a periodic flush timer or explicit retry on next send could close it. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive fix — the retry/replay-cap logic is correct and well-tested, but the async flushPendingSends is called fire-and-forget in two places, which undermines the sequential ordering guarantee under rapid reconnects. The MAX_SEND_RETRIES drop path lacks test coverage.

🟡 missing_tests: No test verifies the MAX_SEND_RETRIES=3 cap: that a message is dropped (not re-queued) after 3 failed attempts. The existing 'infinite retry' test only covers a single flush cycle. A test should trigger 3 reconnect+flush cycles with persistent failures and assert the message is dropped with the expected console.warn. [fixable]

`packages/harness/src/connection-registry.ts`

Solid defensive fix — the retry/replay-cap logic is correct and well-tested, but the async flushPendingSends is called fire-and-forget in two places, which undermines the sequential ordering guarantee under rapid reconnects. The MAX_SEND_RETRIES drop path lacks test coverage.

🔵 style (L56): MAX_REPLAY_GAP=200 is a magic number with significant behavioral impact (clients silently lose events outside this window). Consider documenting how clients recover the gap — the comment mentions /api/sessions/:id/events but the client code has no such recovery mechanism visible in the diff. If the client can't lazy-load, this is a silent data loss path.

`server/index.ts`

Solid defensive fix — the retry/replay-cap logic is correct and well-tested, but the async flushPendingSends is called fire-and-forget in two places, which undermines the sequential ordering guarantee under rapid reconnects. The MAX_SEND_RETRIES drop path lacks test coverage.

🔵 regressions (L106): The old isSessionActive used session.isActive (a boolean). The new shouldSync calls getSessionState() which can return null for sessions that were never inserted into the state machine (e.g. legacy rows). null won't match any of the allowed states, so shouldSync returns false — those sessions silently stop receiving periodic sync. The old code returned false for missing sessions too (via ?? false), so this is likely fine, but worth verifying there are no active sessions without a state row.

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 6 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid defensive PR. The core mechanisms (replay gap cap, retry budget, sequential flush, interrupt try/catch) are well-designed. Main concern: _connected = true before flush completes on the non-reconnect welcome path creates a message ordering race — the reconnect path already does this correctly, so applying the same pattern would close the gap.

🟡 bugs (L232): _connected is set to true BEFORE flushPendingSends() completes (line 232, fire-and-forget via void). If external code calls send() before the flush finishes, the new message bypasses the pending queue via doPost() directly (line 97-98), violating the ordering guarantee the comment on line 234 promises. The reconnect path in doReconnectPost handles this correctly by awaiting flush before setting _connected (line 307-309). Fix: move this._connected = true into the .then() callback, or await the flush (matching the doReconnectPost pattern). [fixable]
🟡 bugs (L356): When doPost is called directly from send() (connected path, line 98) and fails with a transient error (404/429/5xx), it re-queues via enqueuePending. However, pendingSends are only flushed during reconnect (flushPendingSends is only called from welcome handlers). If the SSE stream stays healthy but the POST endpoint has a transient issue, the re-queued message sits indefinitely until the next reconnect — which may never happen. Consider draining pendingSends on a timer or after a short delay, not only on reconnect. [fixable]
🔵 unsafe_assumptions (L233): No guard against concurrent flushes. If checkAndReconnect(true) is called while a flushPendingSends() from a prior welcome is still awaiting its doPost promises, a second flushPendingSends() can run concurrently with a different _connectionId. Both will issue POSTs in parallel, potentially duplicating sends. Low risk in practice (the browser event loop makes rapid double-welcome unlikely), but a flushInProgress flag or chained promise would make it safe by construction. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive PR. The core mechanisms (replay gap cap, retry budget, sequential flush, interrupt try/catch) are well-designed. Main concern: _connected = true before flush completes on the non-reconnect welcome path creates a message ordering race — the reconnect path already does this correctly, so applying the same pattern would close the gap.

🔵 missing_tests: No test verifies that send() called while a flush is in-progress preserves ordering relative to the queued messages. The _open ordering test (line 1072) checks listener event order but not interleaved send() calls. Add a test: queue messages, trigger welcome, call send() before flush completes, verify the queued messages are delivered before the new one. [fixable]

`packages/harness/src/connection-registry.ts`

Solid defensive PR. The core mechanisms (replay gap cap, retry budget, sequential flush, interrupt try/catch) are well-designed. Main concern: _connected = true before flush completes on the non-reconnect welcome path creates a message ordering race — the reconnect path already does this correctly, so applying the same pattern would close the gap.

🔵 style (L55): MAX_REPLAY_GAP = 200 is a magic number. The comment says clients can lazy-load older events via /api/sessions/:id/events, but the client code doesn't implement automatic lazy-loading after a gap skip — the skipped events are silently dropped from the client's perspective. Consider documenting this trade-off more explicitly (e.g., // IMPORTANT: skipped events are NOT auto-fetched by the client). [fixable]
🔵 regressions (L44): The isSessionActive → shouldSync rename is a breaking change to the EventStoreAdapter interface. It's optional and the only consumer is server/index.ts (confirmed), so no runtime breakage. But any out-of-tree consumers (tests, forks) referencing isSessionActive will silently stop filtering because the old property is simply ignored. This is acceptable since the interface is internal, but worth noting.

packages/client/src/sse-connection.ts

@@ @@ -226,8 +230,14 @@ export class SseConnection implements ChatConnection { @@
                       this.doReconnectPost(welcomeConnectionId, welcomeEs);
                     } else {
                       this._connected = true;

dimakis Jun 25, 2026

Owner Author

🟡 bugs: _connected is set to true BEFORE flushPendingSends() completes (line 232, fire-and-forget via void). If external code calls send() before the flush finishes, the new message bypasses the pending queue via doPost() directly (line 97-98), violating the ordering guarantee the comment on line 234 promises. The reconnect path in doReconnectPost handles this correctly by awaiting flush before setting _connected (line 307-309). Fix: move this._connected = true into the .then() callback, or await the flush (matching the doReconnectPost pattern). [fixable]

packages/client/src/sse-connection.ts

+                      // 5xx = server error). Re-queue so it flushes on the next successful
+                      // reconnect. Non-transient errors (400 = invalid body) are not retried.
+                      console.warn(`[SseConnection] POST /${endpoint} returned ${res.status}, re-queuing`);
+                      this.enqueuePending(endpoint, body, retries + 1);

dimakis Jun 25, 2026

Owner Author

🟡 bugs: When doPost is called directly from send() (connected path, line 98) and fails with a transient error (404/429/5xx), it re-queues via enqueuePending. However, pendingSends are only flushed during reconnect (flushPendingSends is only called from welcome handlers). If the SSE stream stays healthy but the POST endpoint has a transient issue, the re-queued message sits indefinitely until the next reconnect — which may never happen. Consider draining pendingSends on a timer or after a short delay, not only on reconnect. [fixable]

packages/client/src/sse-connection.ts Outdated

                       this._connected = true;
-                      this.flushPendingSends();
-                      this.listener?.({ type: '_open' });
+                      if (this.pendingSends.length > 0) {

dimakis Jun 25, 2026

Owner Author

🔵 unsafe_assumptions: No guard against concurrent flushes. If checkAndReconnect(true) is called while a flushPendingSends() from a prior welcome is still awaiting its doPost promises, a second flushPendingSends() can run concurrently with a different _connectionId. Both will issue POSTs in parallel, potentially duplicating sends. Low risk in practice (the browser event loop makes rapid double-welcome unlikely), but a flushInProgress flag or chained promise would make it safe by construction. [fixable]

packages/harness/src/connection-registry.ts

               const SYNC_BATCH_LIMIT = 50;
+              // Maximum gap between cursor and head before we skip ahead.
+              // Prevents replay storms after server restart or long disconnects.
+              // Client can lazy-load older events via the /api/sessions/:id/events API.

dimakis Jun 25, 2026

Owner Author

🔵 style: MAX_REPLAY_GAP = 200 is a magic number. The comment says clients can lazy-load older events via /api/sessions/:id/events, but the client code doesn't implement automatic lazy-loading after a gap skip — the skipped events are silently dropped from the client's perspective. Consider documenting this trade-off more explicitly (e.g., // IMPORTANT: skipped events are NOT auto-fetched by the client). [fixable]

packages/harness/src/connection-registry.ts

+                /** Check if a session should receive periodic sync delivery.
+                 *  When provided, periodic sync skips sessions where this returns false
+                 *  (e.g. ENDED, SUSPENDED, DETACHED). */
+                shouldSync?(sessionId: string): boolean;

dimakis Jun 25, 2026

Owner Author

🔵 regressions: The isSessionActive → shouldSync rename is a breaking change to the EventStoreAdapter interface. It's optional and the only consumer is server/index.ts (confirmed), so no runtime breakage. But any out-of-tree consumers (tests, forks) referencing isSessionActive will silently stop filtering because the old property is simply ignored. This is acceptable since the interface is internal, but worth noting.


          fix: add flush guard against concurrent flushes, retry exhaustion test

cea5b45

Address sixth Centaur review: add _flushing guard to prevent concurrent
flushPendingSends when rapid reconnects fire overlapping welcomes,
add test proving MAX_SEND_RETRIES=3 drops messages after exhaustion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 6 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid defensive hardening with good test coverage. Main concern is a race condition where the _flushing guard can cause _open to fire before flush actually completes during rapid reconnects, and the silent event skip from MAX_REPLAY_GAP with no client notification.

🟡 bugs (L236): _open can fire before flush actually completes. If checkAndReconnect is called while a previous flush is in progress, the new flushPendingSends() returns immediately (guarded by _flushing), and .then() fires _open right away — violating the ordering invariant in the comment. The doReconnectPost path (line 310) has the same issue via await. Consider: after _flushing guard returns, re-check pending sends and loop, or use a mutex/queue pattern instead of a boolean flag. [fixable]
🔵 bugs (L99): When _connected is true, send() calls doPost() fire-and-forget. If doPost hits a transient 404 (stale connectionId), it re-queues the message, but no flush is triggered — the message sits in pendingSends until the SSE stream errors and reconnects. This is safe if the SSE stream is guaranteed to break shortly after the POST 404, but if the SSE stream stays alive with a valid connectionId while the POST uses a stale one (shouldn't happen, same field), the message would be stuck. Not a bug in practice since both use this._connectionId, but worth noting.
🔵 style (L236): void this.flushPendingSends().then(...) has no .catch(). While flushPendingSends shouldn't reject (doPost catches all errors), a defensive .catch(() => this.listener?.({ type: '_open' })) would ensure _open always fires even if an unforeseen error occurs. [fixable]

`packages/harness/src/connection-registry.ts`

Solid defensive hardening with good test coverage. Main concern is a race condition where the _flushing guard can cause _open to fire before flush actually completes during rapid reconnects, and the silent event skip from MAX_REPLAY_GAP with no client notification.

🟡 unsafe_assumptions (L240): When MAX_REPLAY_GAP caps the cursor, events between the old cursor and (head - 200) are silently dropped. The client has no signal that events were skipped — it will see a seq gap but has no protocol-level indication to trigger lazy-loading via /api/sessions/:id/events. Consider emitting a synthetic events_skipped message with the skipped range so the client can react. [fixable]
🔵 regressions (L41): Renaming isSessionActive to shouldSync on EventStoreAdapter is a breaking API change for any external consumers. The interface is exported. Confirmed no in-repo callers use the old name, but if any external code implements this adapter, it would silently stop filtering (the method is optional). Low risk given this is an internal project, but worth noting in the PR description.

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive hardening with good test coverage. Main concern is a race condition where the _flushing guard can cause _open to fire before flush actually completes during rapid reconnects, and the silent event skip from MAX_REPLAY_GAP with no client notification.

🔵 missing_tests: No test covers the scenario where flushPendingSends is called while _flushing is true (concurrent flush). This is the race condition from the first finding — a test showing that _open is deferred correctly (or documenting that it isn't) would be valuable. [fixable]


          fix: replace _flushing boolean with promise to prevent concurrent flu…

f5e36f3

…sh race

Address seventh Centaur review: concurrent flushPendingSends calls now
wait for the in-progress flush instead of skipping, ensuring _open
fires only after all queued sends complete. Added .catch() safety net
and concurrent flush race test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented Jun 25, 2026

Owner Author

Centaur Review

Found 5 issue(s) (3 warning).

`server/index.ts`

Solid defensive PR that prevents replay storms and adds retry resilience. The main concerns are a behavioral change in DETACHED session sync (intentional but worth verifying observer impact) and a potential message-stranding edge case when concurrent welcomes overlap with partial flush failures.

🟡 regressions (L113): Behavioral change from old isSessionActive check: the old code used the isActive boolean field which remained true during DETACHED state (only detachChat() sets state to DETACHED — it does not flip isActive to false). The new shouldSync excludes DETACHED. During the DETACHED window (up to 10 min TTL), observers watching via ConnectionRegistry won't receive periodic sync catch-ups for missed broadcast events. Once the session transitions to CLOSING (included in shouldSync), sync resumes and catches up. This appears intentional per the PR description, but it is a behavioral change worth calling out — the old code synced DETACHED sessions, the new code does not.
🔵 missing_tests (L108): No test covers the case where eventStore.getSessionState() returns null (session not yet in EventStore). The code handles it correctly — null doesn't match any state string, so shouldSync returns false — but a test would document this edge case, especially since a watched session could briefly exist in ConnectionRegistry before being persisted to EventStore. [fixable]

`packages/client/src/sse-connection.ts`

Solid defensive PR that prevents replay storms and adds retry resilience. The main concerns are a behavioral change in DETACHED session sync (intentional but worth verifying observer impact) and a potential message-stranding edge case when concurrent welcomes overlap with partial flush failures.

🟡 unsafe_assumptions (L99): When send() fires doPost() fire-and-forget while connected (line 99, no await), a transient failure (500/429/404) re-queues the message for the next reconnect. If subsequent send() calls succeed before any reconnect, the server receives messages out of order (newer messages arrive before the re-queued older one). Before this PR, the same scenario silently dropped the message, so this is strictly an improvement — but the new retry mechanism introduces an ordering invariant that isn't preserved. Consider whether send() should serialize through a promise chain to guarantee ordering, or document that order is best-effort. [fixable]
🟡 bugs (L377): When a second welcome arrives during an in-flight flush, flushPendingSends() returns the existing _flushPromise (line 377). If the in-flight flush has failures that re-queue messages into the now-empty pendingSends, those re-queued messages are never flushed — _open fires after the promise resolves but no second flush is triggered for the newly queued messages. They wait until the next reconnect cycle. In the concurrent-welcome test, the flush succeeds so this path isn't exercised. The window is small (only during concurrent welcomes with partial flush failures) but could strand messages. [fixable]
🔵 style (L31): MAX_SEND_RETRIES = 3 combined with the retries >= MAX_SEND_RETRIES check in enqueuePending means 3 total send attempts (retries 0, 1, 2), not 3 retries after the original. The log message dropping /${endpoint} after ${retries} retries prints '3' when the counter is 3 (the drop threshold), but only 2 actual retries occurred. Consider renaming to MAX_SEND_ATTEMPTS or adjusting the log to say after ${retries - 1} retries for clarity. [fixable]


          fix: drain re-queued sends after flush, rename MAX_SEND_ATTEMPTS

adcbf5d

Address eighth Centaur review: flush loop re-checks pendingSends after
each batch so failures re-queued mid-flush aren't stranded until next
reconnect. Rename MAX_SEND_RETRIES → MAX_SEND_ATTEMPTS for clarity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

LGTM — no issues found.

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 4 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

The while loop in flushPendingSends defeats the cross-reconnect retry strategy by exhausting all MAX_SEND_ATTEMPTS in a single flush cycle; otherwise solid defensive changes with good test coverage for the happy paths.

🟡 bugs (L383): The while (this.pendingSends.length > 0) loop in flushPendingSends causes re-queued failures to be retried immediately within the same flush cycle, burning all MAX_SEND_ATTEMPTS (3) in rapid succession rather than spreading retries across reconnects. When doPost fails (e.g. 404 from stale connectionId) and calls enqueuePending, the item is added to this.pendingSends. The inner for loop finishes, the while condition re-evaluates to true, and the item is retried immediately — still using the same stale _connectionId. For 404 errors this always fails; for 429 it retries without backoff. The doPost comment on line 356 says 'Re-queue so it flushes on the next successful reconnect' but the while loop contradicts that intent. Fix: remove the while loop and do a single pass — re-queued items will naturally flush on the next reconnect with a fresh connectionId. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

The while loop in flushPendingSends defeats the cross-reconnect retry strategy by exhausting all MAX_SEND_ATTEMPTS in a single flush cycle; otherwise solid defensive changes with good test coverage for the happy paths.

🟡 missing_tests (L1023): The test 're-queued sends do not cause infinite retry within one flush cycle' does not exercise flushPendingSends at all — it calls conn.send() on an already-connected instance, which goes through doPost directly (fire-and-forget from send()), never triggering a flush. The test passes because only 1 direct send occurs, but the comment 'flushPendingSends atomically captures the queue before iterating' describes behavior that is never exercised. A test that queues a persistently-failing message offline, reconnects (triggering flush), and verifies it is NOT retried more than once within that single flush would actually cover the stated invariant. [fixable]
🔵 style (L1050): The test 'drops message after MAX_SEND_ATTEMPTS (3) exhausted across reconnects' has comments claiming retries are spread across 3 separate reconnect cycles (conn-1, conn-2, conn-3). In reality, due to the while loop in flushPendingSends, all 3 attempts are exhausted during the conn-2 flush — the conn-3 and conn-4 reconnects are no-ops. The test passes because its assertions happen to be correct regardless, but the step-by-step comments are misleading about the actual retry timing. [fixable]

`server/index.ts`

The while loop in flushPendingSends defeats the cross-reconnect retry strategy by exhausting all MAX_SEND_ATTEMPTS in a single flush cycle; otherwise solid defensive changes with good test coverage for the happy paths.

🔵 regressions (L107): The old isSessionActive checked eventStore.getSession(sessionId)?.isActive, which returned true for DETACHED sessions (agent still running, client disconnected). The new shouldSync excludes DETACHED. During DETACHED state the agent is still producing events (per the state machine: DETACHED can transition to ACTIVE, CLOSING, or ENDED). If a second client connection (e.g. another browser tab) is watching a DETACHED session via the ConnectionRegistry, it will stop receiving periodic sync delivery. This may be intentional, but the behavioral change from the old code is worth verifying. [fixable]

packages/client/src/sse-connection.ts

+                    try {
+                      // Loop: re-check after each batch in case doPost failures re-queued
+                      // new items during the flush (e.g. concurrent welcome + partial failure).
+                      while (this.pendingSends.length > 0) {

dimakis Jun 25, 2026

Owner Author

🟡 bugs: The while (this.pendingSends.length > 0) loop in flushPendingSends causes re-queued failures to be retried immediately within the same flush cycle, burning all MAX_SEND_ATTEMPTS (3) in rapid succession rather than spreading retries across reconnects. When doPost fails (e.g. 404 from stale connectionId) and calls enqueuePending, the item is added to this.pendingSends. The inner for loop finishes, the while condition re-evaluates to true, and the item is retried immediately — still using the same stale _connectionId. For 404 errors this always fails; for 429 it retries without backoff. The doPost comment on line 356 says 'Re-queue so it flushes on the next successful reconnect' but the while loop contradicts that intent. Fix: remove the while loop and do a single pass — re-queued items will naturally flush on the next reconnect with a fresh connectionId. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

+                  const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
+                  let sendCallCount = 0;
+                  const mockFetch = vi.fn().mockImplementation((url: string) => {
+                    if (typeof url === 'string' && url.includes('/send')) {

dimakis Jun 25, 2026

Owner Author

🟡 missing_tests: The test 're-queued sends do not cause infinite retry within one flush cycle' does not exercise flushPendingSends at all — it calls conn.send() on an already-connected instance, which goes through doPost directly (fire-and-forget from send()), never triggering a flush. The test passes because only 1 direct send occurs, but the comment 'flushPendingSends atomically captures the queue before iterating' describes behavior that is never exercised. A test that queues a persistently-failing message offline, reconnects (triggering flush), and verifies it is NOT retried more than once within that single flush would actually cover the stated invariant. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

+                    if (typeof url === 'string' && url.includes('/send')) {
+                      return Promise.resolve({ ok: false, status: 500 });
+                    }
+                    return Promise.resolve({ ok: true });

dimakis Jun 25, 2026

Owner Author

🔵 style: The test 'drops message after MAX_SEND_ATTEMPTS (3) exhausted across reconnects' has comments claiming retries are spread across 3 separate reconnect cycles (conn-1, conn-2, conn-3). In reality, due to the while loop in flushPendingSends, all 3 attempts are exhausted during the conn-2 flush — the conn-3 and conn-4 reconnects are no-ops. The test passes because its assertions happen to be correct regardless, but the step-by-step comments are misleading about the actual retry timing. [fixable]

server/index.ts

                 getEventsAfter: (sessionId, afterSeq, limit) =>
                   eventStore.getEventsAfter(sessionId, afterSeq, limit),
-                isSessionActive: (sessionId) => eventStore.getSession(sessionId)?.isActive ?? false,
+                shouldSync: (sessionId) => {

dimakis Jun 25, 2026

Owner Author

🔵 regressions: The old isSessionActive checked eventStore.getSession(sessionId)?.isActive, which returned true for DETACHED sessions (agent still running, client disconnected). The new shouldSync excludes DETACHED. During DETACHED state the agent is still producing events (per the state machine: DETACHED can transition to ACTIVE, CLOSING, or ENDED). If a second client connection (e.g. another browser tab) is watching a DETACHED session via the ConnectionRegistry, it will stop receiving periodic sync delivery. This may be intentional, but the behavioral change from the old code is worth verifying. [fixable]

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 4 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid defensive improvements (retry logic, replay gap cap, interrupt crash guard), but the while loop in flushPendingSends eagerly exhausts all retry attempts within a single flush cycle rather than spreading them across reconnects — undermining the retry design for 429/5xx errors. Two tests pass but don't validate the behavior their names describe.

🟡 bugs (L383): The while (this.pendingSends.length > 0) loop in flushPendingSends causes failed-and-re-queued messages to be immediately retried within the same flush cycle, exhausting all MAX_SEND_ATTEMPTS in one go rather than spreading retries across reconnects. This is counterproductive for 429 (rate-limited) responses — immediate retry worsens rate limiting. For 500s it's also wasteful since the same server state likely persists. The retry mechanism would be more effective if the while loop were removed (just process one captured batch per flush), deferring re-queued items to the next reconnect. [fixable]
🔵 style (L236): The .catch(() => {}).then(...) chain on flushPendingSends() swallows all errors before emitting _open. This is fine defensively, but the empty catch with no logging makes flush failures invisible on the initial-connect path. Consider logging a warning in the catch, matching the error visibility of the doReconnectPost path which propagates flush errors to its own try/catch and calls scheduleReconnect(). [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive improvements (retry logic, replay gap cap, interrupt crash guard), but the while loop in flushPendingSends eagerly exhausts all retry attempts within a single flush cycle rather than spreading them across reconnects — undermining the retry design for 429/5xx errors. Two tests pass but don't validate the behavior their names describe.

🔵 missing_tests (L1019): The test 'does not cause infinite retry within one flush cycle' never exercises the flush path — send() goes through doPost() directly (connection is alive) and the re-queued message sits in pendingSends. No flushPendingSends while loop is invoked. The test proves that a direct doPost failure doesn't trigger a flush, but doesn't verify the while loop's behavior during an actual flush cycle. [fixable]
🟡 missing_tests (L1044): Test 'drops message after MAX_SEND_ATTEMPTS (3) exhausted across reconnects' is misleading. Because seqBySession is empty (no session messages emitted), every welcome takes the non-reconnect path and calls flushPendingSends(). The while loop inside flush exhausts all remaining retries in conn-2's flush cycle alone (attempt 2 fails → re-queues with retries=2 → while loop retries → attempt 3 fails → retries=3 → dropped). Conn-3 and conn-4 are no-ops. The test passes but doesn't validate the 'across reconnects' behavior its name and comments describe. [fixable]

packages/client/src/sse-connection.ts

+                    try {
+                      // Loop: re-check after each batch in case doPost failures re-queued
+                      // new items during the flush (e.g. concurrent welcome + partial failure).
+                      while (this.pendingSends.length > 0) {

dimakis Jun 25, 2026

Owner Author

🟡 bugs: The while (this.pendingSends.length > 0) loop in flushPendingSends causes failed-and-re-queued messages to be immediately retried within the same flush cycle, exhausting all MAX_SEND_ATTEMPTS in one go rather than spreading retries across reconnects. This is counterproductive for 429 (rate-limited) responses — immediate retry worsens rate limiting. For 500s it's also wasteful since the same server state likely persists. The retry mechanism would be more effective if the while loop were removed (just process one captured batch per flush), deferring re-queued items to the next reconnect. [fixable]

packages/client/src/sse-connection.ts

-                      this.listener?.({ type: '_open' });
+                      if (this.pendingSends.length > 0 || this._flushPromise) {
+                        // Flush must complete before _open fires to preserve message ordering.
+                        void this.flushPendingSends()

dimakis Jun 25, 2026

Owner Author

🔵 style: The .catch(() => {}).then(...) chain on flushPendingSends() swallows all errors before emitting _open. This is fine defensively, but the empty catch with no logging makes flush failures invisible on the initial-connect path. Consider logging a warning in the catch, matching the error visibility of the doReconnectPost path which propagates flush errors to its own try/catch and calls scheduleReconnect(). [fixable]

packages/client/src/__tests__/sse-connection.test.ts

+                  warnSpy.mockRestore();
+                });
+                it('re-queued sends do not cause infinite retry within one flush cycle', async () => {

dimakis Jun 25, 2026

Owner Author

🔵 missing_tests: The test 'does not cause infinite retry within one flush cycle' never exercises the flush path — send() goes through doPost() directly (connection is alive) and the re-queued message sits in pendingSends. No flushPendingSends while loop is invoked. The test proves that a direct doPost failure doesn't trigger a flush, but doesn't verify the while loop's behavior during an actual flush cycle. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

+                  warnSpy.mockRestore();
+                });
+                it('drops message after MAX_SEND_ATTEMPTS (3) exhausted across reconnects', async () => {

dimakis Jun 25, 2026

Owner Author

🟡 missing_tests: Test 'drops message after MAX_SEND_ATTEMPTS (3) exhausted across reconnects' is misleading. Because seqBySession is empty (no session messages emitted), every welcome takes the non-reconnect path and calls flushPendingSends(). The while loop inside flush exhausts all remaining retries in conn-2's flush cycle alone (attempt 2 fails → re-queues with retries=2 → while loop retries → attempt 3 fails → retries=3 → dropped). Conn-3 and conn-4 are no-ops. The test passes but doesn't validate the 'across reconnects' behavior its name and comments describe. [fixable]

dimakis commented

View reviewed changes

dimakis left a comment

Owner Author

Centaur Review

Found 5 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid defensive improvements for SSE delivery reliability. The main concern is a while-loop in flushPendingSends that burns through all retry attempts within a single flush cycle rather than distributing them across reconnects as comments and tests describe — particularly counterproductive for 429 rate limiting.

🟡 bugs (L383): The while (this.pendingSends.length > 0) loop in flushPendingSends will exhaust all remaining retries within a single flush cycle, not across reconnects. When doPost fails at line 359 it calls enqueuePending which pushes to this.pendingSends (reset to [] at line 385). The while loop re-checks and immediately retries. With MAX_SEND_ATTEMPTS=3, a fresh message gets 3 rapid-fire attempts in one flush, then is dropped — contradicting the doPost comment at line 356 ("Re-queue so it flushes on the next successful reconnect"). For 429 rate-limiting this is counterproductive: the client burns through all retries instantly instead of backing off. Consider breaking out of the while loop after the inner for-loop to defer re-queued items to the next reconnect, or at minimum adjust the comment. [fixable]
🔵 unsafe_assumptions (L354): 429 (rate limited) is treated the same as 404 and 5xx for retry purposes, but 429 typically includes a Retry-After header indicating how long to wait. Immediately re-queuing (and with the while loop, immediately retrying) defeats the rate limit signal. Consider either reading Retry-After or at least not re-queuing 429s for immediate retry within the same flush cycle. [fixable]
🔵 style (L356): Comment says "Re-queue so it flushes on the next successful reconnect" but with the while loop in flushPendingSends, re-queued items are retried immediately within the same flush cycle. Either the comment or the behavior should be updated to match the other. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid defensive improvements for SSE delivery reliability. The main concern is a while-loop in flushPendingSends that burns through all retry attempts within a single flush cycle rather than distributing them across reconnects as comments and tests describe — particularly counterproductive for 429 rate limiting.

🟡 missing_tests (L1019): The test 're-queued sends do not cause infinite retry within one flush cycle' sends via conn.send() while connected (direct doPost, not the flush path). The re-queued message just sits in pendingSends — the while loop in flushPendingSends is never exercised. A test that queues a message while disconnected and then reconnects (triggering flushPendingSends) would reveal the while-loop retry burn-through behavior. Similarly, the 'drops message after MAX_SEND_ATTEMPTS' test comments describe retries spread across reconnects 1–3, but all retries actually happen during reconnect 1's flush cycle — the test passes by accident because it only checks final state. [fixable]

`packages/harness/src/connection-registry.ts`

Solid defensive improvements for SSE delivery reliability. The main concern is a while-loop in flushPendingSends that burns through all retry attempts within a single flush cycle rather than distributing them across reconnects as comments and tests describe — particularly counterproductive for 429 rate limiting.

🔵 regressions (L41): EventStoreAdapter is exported from @mitzo/harness (index.ts:16). Renaming isSessionActive to shouldSync is a breaking change to the public interface. The only implementation is in server/index.ts (updated), and no stale references remain, so this is safe within the monorepo. If any external consumers exist, they would break. Adding getHeadSeq is additive and non-breaking since it's optional.

packages/client/src/sse-connection.ts

+                    try {
+                      // Loop: re-check after each batch in case doPost failures re-queued
+                      // new items during the flush (e.g. concurrent welcome + partial failure).
+                      while (this.pendingSends.length > 0) {

dimakis Jun 26, 2026

Owner Author

🟡 bugs: The while (this.pendingSends.length > 0) loop in flushPendingSends will exhaust all remaining retries within a single flush cycle, not across reconnects. When doPost fails at line 359 it calls enqueuePending which pushes to this.pendingSends (reset to [] at line 385). The while loop re-checks and immediately retries. With MAX_SEND_ATTEMPTS=3, a fresh message gets 3 rapid-fire attempts in one flush, then is dropped — contradicting the doPost comment at line 356 ("Re-queue so it flushes on the next successful reconnect"). For 429 rate-limiting this is counterproductive: the client burns through all retries instantly instead of backing off. Consider breaking out of the while loop after the inner for-loop to defer re-queued items to the next reconnect, or at minimum adjust the comment. [fixable]

packages/client/src/sse-connection.ts

                       body: JSON.stringify(body),
                     });
+                    const retryable = endpoint === 'send' || endpoint === 'permission';
+                    if (!res.ok && retryable && (res.status === 404 || res.status === 429 || res.status >= 500)) {

dimakis Jun 26, 2026

Owner Author

🔵 unsafe_assumptions: 429 (rate limited) is treated the same as 404 and 5xx for retry purposes, but 429 typically includes a Retry-After header indicating how long to wait. Immediately re-queuing (and with the while loop, immediately retrying) defeats the rate limit signal. Consider either reading Retry-After or at least not re-queuing 429s for immediate retry within the same flush cycle. [fixable]

packages/client/src/sse-connection.ts

+                    const retryable = endpoint === 'send' || endpoint === 'permission';
+                    if (!res.ok && retryable && (res.status === 404 || res.status === 429 || res.status >= 500)) {
+                      // Transient failure (404 = stale connectionId, 429 = rate limited,
+                      // 5xx = server error). Re-queue so it flushes on the next successful

dimakis Jun 26, 2026

Owner Author

🔵 style: Comment says "Re-queue so it flushes on the next successful reconnect" but with the while loop in flushPendingSends, re-queued items are retried immediately within the same flush cycle. Either the comment or the behavior should be updated to match the other. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

+                  warnSpy.mockRestore();
+                });
+                it('re-queued sends do not cause infinite retry within one flush cycle', async () => {

dimakis Jun 26, 2026

Owner Author

🟡 missing_tests: The test 're-queued sends do not cause infinite retry within one flush cycle' sends via conn.send() while connected (direct doPost, not the flush path). The re-queued message just sits in pendingSends — the while loop in flushPendingSends is never exercised. A test that queues a message while disconnected and then reconnects (triggering flushPendingSends) would reveal the while-loop retry burn-through behavior. Similarly, the 'drops message after MAX_SEND_ATTEMPTS' test comments describe retries spread across reconnects 1–3, but all retries actually happen during reconnect 1's flush cycle — the test passes by accident because it only checks final state. [fixable]

packages/harness/src/connection-registry.ts

-                /** Optional: check if a session is still active. When provided, periodic sync
-                 *  skips ended sessions to avoid unnecessary EventStore queries. */
-                isSessionActive?(sessionId: string): boolean;
+                /** Check if a session should receive periodic sync delivery.

dimakis Jun 26, 2026

Owner Author

🔵 regressions: EventStoreAdapter is exported from @mitzo/harness (index.ts:16). Renaming isSessionActive to shouldSync is a breaking change to the public interface. The only implementation is in server/index.ts (updated), and no stale references remain, so this is safe within the monorepo. If any external consumers exist, they would break. Adding getHeadSeq is additive and non-breaking since it's optional.

dimakis mentioned this pull request

feat(transport): stable client connectionId + POST failure resilience #401

Merged

6 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet