feat(transport): stable client connectionId + POST failure resilience by dimakis · Pull Request #401 · dimakis/mitzo

dimakis · 2026-06-26T13:33:27Z

Summary

Eliminates the "send message twice to wake agent" bug after iOS kills the SSE connection.

Root cause: Every SSE reconnect minted a new server-generated connectionId, breaking the clientId = ${connectionId}:${sessionId} ownership chain. handleReconnect skipped reattach when the old transport appeared attached but was actually dead.
Fix: Client generates a stable UUID per tab (cid-{uuid}), sends it via ?cid= on SSE GET. Server reuses it across reconnects — getOwnerConnection(clientId) always returns the same connectionId, so isOwner is always true. No handleReconnect code changes needed.
Resilience: POST failures (404/429/5xx/network) are re-queued with _send_failed emission. Optimistic SET_RUNNING after send. Deferred connection cleanup with 60s TTL.

Changes

File	What
`packages/client/src/sse-connection.ts`	Stable cid in URL, welcome validation, backward compat, `_send_failed` emission
`packages/harness/src/connection-registry.ts`	`reconnect()` swaps transport preserving state, `markInactive()` + `onExpired` TTL
`server/index.ts`	Accept `?cid=`, validate, `connRegistry.reconnect()`, deferred close, `onExpired` wiring
`frontend/src/client-store.ts`	`sessionStorage`-persisted connectionId per tab
`packages/client/src/store.ts`	Optimistic `SET_RUNNING`, `_send_failed` handler

Backward Compatibility

Client	Server	Behavior
New (sends `?cid=`)	New	Stable connectionId, full fix
New (sends `?cid=`)	Old (ignores cid)	Falls back to server-assigned ID
Old (no cid)	New	Server generates `conn-*` as before

Test plan

17 new tests: stable cid generation/persistence/URL, reconnect transport swap, markInactive TTL, onExpired callback, POST failure surfacing (500/429/network), re-queue + retry
All 90 connection-registry + sse-connection tests pass
Lint + gitleaks clean
Deploy server, rebuild iOS app
Test: open session → send → force-kill app → reopen → send → should respond immediately (no double-send)
Test: airplane mode 5s → disable → send → should work on first try

Supersedes #396.

🤖 Generated with Claude Code

Client generates a UUID per tab, sends it via ?cid= on SSE GET. Server reuses it across reconnects, eliminating the ownership race in handleReconnect that caused the "send twice to wake agent" bug. - SseConnection: stable cid in URL, welcome validation, backward compat - ConnectionRegistry: reconnect() swaps transport preserving state, markInactive() with 60s TTL + onExpired callback for deferred cleanup - Server: accept ?cid=, validate format, connRegistry.reconnect() on existing cid, deferred close with onExpired → detachChat() - Frontend: sessionStorage-persisted connectionId per tab - Store: optimistic SET_RUNNING after send, _send_failed handler - doPost: re-queue failed sends (404/429/5xx) with _send_failed emission - Tests: 17 new tests for stable cid, reconnect, TTL, POST failures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis · 2026-06-26T13:46:02Z

Centaur Review

Found 6 issue(s) (3 warning).

`server/index.ts`

Well-designed transport resilience feature with thorough client-side tests. Main concern: the optimistic SET_RUNNING in the store can leave the UI stuck if a send POST fails while the SSE stream remains open, and the onExpired callback lacks the transport ownership guard present in the legacy close handler.

🟡 bugs: The onExpired callback detaches sessions without checking transport ownership. The legacy SSE close handler has session.transport === transport guard to avoid detaching sessions already reattached by another connection; onExpired has no equivalent. Currently safe because remove() and the callback run synchronously in the same setTimeout, but fragile — an ownership check like getOwnerConnection(found.clientId) === connectionId would match the legacy handler's safety guarantees. [fixable]
🔵 unsafe_assumptions: CID validation regex /^cid-[0-9a-f-]{36}$/ is looser than UUID format — it accepts patterns like 36 dashes or 36 hex digits with no structure. A stricter pattern like /^cid-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/ would reject malformed cids. Not a security issue (the cid is just a lookup key) but tighter validation prevents confusion. [fixable]
🔵 missing_tests: No server-side tests cover the new stable CID flow: the ?cid= query param validation/branching in the SSE handler, the stable-cid vs legacy close handler paths, or the onExpired callback that triggers session detach on TTL expiry. All new tests are client-side (SseConnection) or harness-level (ConnectionRegistry). [fixable]

`packages/client/src/store.ts`

Well-designed transport resilience feature with thorough client-side tests. Main concern: the optimistic SET_RUNNING in the store can leave the UI stuck if a send POST fails while the SSE stream remains open, and the onExpired callback lacks the transport ownership guard present in the legacy close handler.

🟡 bugs: The optimistic SET_RUNNING dispatch (new in this PR) is not cleared when _send_failed arrives. If the POST to /api/chat/send fails but the SSE stream stays open (no reconnect), running stays true indefinitely — the user sees both a spinner and the error banner, and cannot send new messages. The inline comment references a 'send-timeout safety net' that does not appear to exist. [fixable]

`packages/client/src/sse-connection.ts`

Well-designed transport resilience feature with thorough client-side tests. Main concern: the optimistic SET_RUNNING in the store can leave the UI stuck if a send POST fails while the SSE stream remains open, and the onExpired callback lacks the transport ownership guard present in the legacy close handler.

🟡 unsafe_assumptions: doPost re-queues failed sends into pendingSends, and flushPendingSends calls doPost without awaiting. If multiple pending sends all fail concurrently during a flush (e.g., server returns 500), each fires a separate _send_failed event and re-queues. There is no max-retry counter or exponential backoff — messages will retry on every reconnect indefinitely. This is bounded by MAX_PENDING_SENDS (100) but could saturate the queue with retries of permanently failing messages. [fixable]

`server/ws-handler-v2.ts`

Well-designed transport resilience feature with thorough client-side tests. Main concern: the optimistic SET_RUNNING in the store can leave the UI stuck if a send POST fails while the SSE stream remains open, and the onExpired callback lacks the transport ownership guard present in the legacy close handler.

🔵 style (L89): getOwnerConnection JSDoc (lines 88-90) says 'ConnectionIds use the format conn-${timestamp}-${random}' but stable CIDs now use cid-${uuid}. The function works correctly for both formats (neither contains colons), but the comment is stale and could mislead future readers. [fixable]

dimakis

Centaur Review

Found 6 issue(s) (1 critical) (3 warning).

`server/index.ts`

Solid transport resilience feature, but the onExpired callback has a critical missing transport-ownership check that can detach sessions driven by other connections, and the stable-cid close path leaks observer transport references.

🔴 bugs (L118): onExpired callback is missing the session.transport === transport ownership check that the legacy SSE close handler has (line 417). If another tab/connection reattaches to the same session during the 60s grace period, onExpired will incorrectly detach it. The callback only receives (connectionId, watchedSessions) — it needs access to the original transport, or a connectionId-based ownership check, to avoid detaching sessions now driven by a different connection. [fixable]
🟡 regressions (L438): The stable-cid SSE close path calls connRegistry.markInactive(connectionId) but never calls registry.removeObserver(transport). The legacy path (line 450) does call it. Stale transport references will accumulate in the SessionRegistry observer sets across reconnects — each reconnect creates a new SseTransport, adds it as an observer, but the old one is never removed. This leaks memory and causes dead-transport broadcast attempts. [fixable]
🟡 unsafe_assumptions (L403): The CID validation regex /^cid-[0-9a-f-]{36}$/ assumes crypto.randomUUID() always produces lowercase hex. While the Web Crypto spec mandates lowercase, the regex silently falls back to a server-generated connectionId for any mismatch — which defeats the feature. Consider case-insensitive match or normalizing to lowercase. [fixable]

`packages/client/src/sse-connection.ts`

Solid transport resilience feature, but the onExpired callback has a critical missing transport-ownership check that can detach sessions driven by other connections, and the stable-cid close path leaks observer transport references.

🟡 bugs (L348): doPost re-queues failed sends into pendingSends with no retry counter. If a message triggers a persistent 404 or 500 (e.g., malformed body), it will be re-queued and retried on every reconnect forever. Consider adding a maxRetries field to pending items and dropping messages that exceed it, or at minimum emitting willRetry: false after N retries. [fixable]
🔵 style (L350): The _send_failed synthetic event type uses a leading underscore convention shared with _open and _close, but unlike those, it carries structured fields (clientMsgId, willRetry). Consider defining a TypeScript interface for it in the protocol types to catch misuse at compile time, rather than relying on untyped Record<string, unknown> matching. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid transport resilience feature, but the onExpired callback has a critical missing transport-ownership check that can detach sessions driven by other connections, and the stable-cid close path leaks observer transport references.

🔵 missing_tests: No test for the retry-limit concern: a persistently failing send message being re-queued and retried across multiple reconnects. Adding a test that verifies behavior after N consecutive failures would validate the retry semantics. [fixable]

dimakis · 2026-06-26T13:58:31Z

+    if (!found) continue;
+    if (registry.isSuspended(found.clientId)) continue;
+    const session = registry.get(found.clientId);
+    if (session && registry.isAttached(found.clientId)) {


🔴 bugs: onExpired callback is missing the session.transport === transport ownership check that the legacy SSE close handler has (line 417). If another tab/connection reattaches to the same session during the 60s grace period, onExpired will incorrectly detach it. The callback only receives (connectionId, watchedSessions) — it needs access to the original transport, or a connectionId-based ownership check, to avoid detaching sessions now driven by a different connection. [fixable]

dimakis · 2026-06-26T13:58:31Z


-    const conn = connRegistry.get(connectionId);
-    const watchedSessions = conn ? [...conn.watchedSessions] : [];
+    if (isStableCid) {


🟡 regressions: The stable-cid SSE close path calls connRegistry.markInactive(connectionId) but never calls registry.removeObserver(transport). The legacy path (line 450) does call it. Stale transport references will accumulate in the SessionRegistry observer sets across reconnects — each reconnect creates a new SseTransport, adds it as an observer, but the old one is never removed. This leaks memory and causes dead-transport broadcast attempts. [fixable]

dimakis · 2026-06-26T13:58:31Z

+  // ownership race. Client sends ?cid=<uuid>; server reuses it across
+  // SSE reconnects so the clientId (connectionId:sessionId) never changes.
+  const clientCid = req.query.cid as string | undefined;
+  const isStableCid = clientCid != null && /^cid-[0-9a-f-]{36}$/.test(clientCid);


🟡 unsafe_assumptions: The CID validation regex /^cid-[0-9a-f-]{36}$/ assumes crypto.randomUUID() always produces lowercase hex. While the Web Crypto spec mandates lowercase, the regex silently falls back to a server-generated connectionId for any mismatch — which defeats the feature. Consider case-insensitive match or normalizing to lowercase. [fixable]

dimakis · 2026-06-26T13:58:31Z

      });
+      if (!res.ok && endpoint === 'send') {
+        // Transient failure — re-queue for retry on next reconnect
+        if (res.status === 404 || res.status === 429 || res.status >= 500) {


🟡 bugs: doPost re-queues failed sends into pendingSends with no retry counter. If a message triggers a persistent 404 or 500 (e.g., malformed body), it will be re-queued and retried on every reconnect forever. Consider adding a maxRetries field to pending items and dropping messages that exceed it, or at minimum emitting willRetry: false after N retries. [fixable]

dimakis · 2026-06-26T13:58:31Z

+        // Transient failure — re-queue for retry on next reconnect
+        if (res.status === 404 || res.status === 429 || res.status >= 500) {
+          this.pendingSends.push({ endpoint, body });
+          this.listener?.({


🔵 style: The _send_failed synthetic event type uses a leading underscore convention shared with _open and _close, but unlike those, it carries structured fields (clientMsgId, willRetry). Consider defining a TypeScript interface for it in the protocol types to catch misuse at compile time, rather than relying on untyped Record<string, unknown> matching. [fixable]

…ak, retry limit - onExpired: verify session still owned by expired connectionId before detaching (prevents cross-tab detach race) - Stable-cid close: call removeObserver(transport) to prevent stale transport accumulation in SessionRegistry observer sets - CID regex: case-insensitive match for crypto.randomUUID() compat - doPost: add retry counter (max 3), emit willRetry:false after exhaustion - Export SendFailedEvent interface for type-safe event handling - Test: verify message dropped after max retries with willRetry:false Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis

Centaur Review

Found 6 issue(s) (2 warning).

`packages/client/src/store.ts`

Solid transport resilience feature with good test coverage. Main concern: optimistic running state set on send is never cleared when _send_failed fires with willRetry: false, leaving the UI stuck in a thinking spinner.

🟡 bugs (L377): Optimistic SET_RUNNING is set when send() returns true, but when _send_failed fires (lines 714-718), only sendError is set — running is never reset to false. If the final willRetry: false event fires, the UI stays in a permanent "thinking" spinner. The _send_failed handler should also dispatch SET_RUNNING: false when willRetry is false. [fixable]

`server/index.ts`

Solid transport resilience feature with good test coverage. Main concern: optimistic running state set on send is never cleared when _send_failed fires with willRetry: false, leaving the UI stuck in a thinking spinner.

🟡 bugs (L413): The regex /^cid-[0-9a-f-]{36}$/i allows malformed UUIDs (e.g. cid------------------------------------- — 36 dashes) because [0-9a-f-]{36} doesn't enforce UUID structure. A crafted cid query param could pass validation while not being a real UUID. Impact is low since the cid is only used as a map key, but consider a stricter pattern like /^cid-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i. [fixable]
🔵 style (L117): getOwnerConnection() is already exported from ws-handler-v2.ts and does this exact logic (clientId.slice(0, colonIdx)). The onExpired callback reimplements it with substring(0, indexOf(':')). Import and reuse the existing function to avoid divergence. [fixable]
🔵 missing_tests (L112): The setOnExpired callback in index.ts (lines 112-139) has no integration test. It contains ownership checks, suspended-session guards, and detach logic — all of which are only tested indirectly via the ConnectionRegistry unit tests. A test covering the wiring (TTL expires → sessions actually detached) would catch regressions in the glue code.

`packages/client/src/tests/sse-connection.test.ts`

Solid transport resilience feature with good test coverage. Main concern: optimistic running state set on send is never cleared when _send_failed fires with willRetry: false, leaving the UI stuck in a thinking spinner.

🔵 missing_tests: No test covers what happens when doPost fails for a 4xx status OTHER than 404/429 (e.g. 400, 403, 422). These are silently swallowed — no _send_failed event, no retry. This is likely intentional (client errors shouldn't retry), but a test documenting this behavior would prevent accidental regressions. [fixable]

`packages/client/src/sse-connection.ts`

Solid transport resilience feature with good test coverage. Main concern: optimistic running state set on send is never cleared when _send_failed fires with willRetry: false, leaving the UI stuck in a thinking spinner.

🔵 unsafe_assumptions (L363): 404 is treated as retryable alongside 429 and 5xx. A 404 on the send endpoint typically means the session or route doesn't exist — retrying won't help. Consider treating 404 as non-retryable (emit _send_failed with willRetry: false) to avoid wasting 3 retry cycles on a permanently missing resource. [fixable]

dimakis · 2026-06-26T14:07:14Z

        if (!sent) {
          set({ sendError: 'Not connected. Message will be sent when reconnected.' });
+        } else {
+          // Optimistic: show "thinking" immediately. The server will confirm


🟡 bugs: Optimistic SET_RUNNING is set when send() returns true, but when _send_failed fires (lines 714-718), only sendError is set — running is never reset to false. If the final willRetry: false event fires, the UI stays in a permanent "thinking" spinner. The _send_failed handler should also dispatch SET_RUNNING: false when willRetry is false. [fixable]

dimakis · 2026-06-26T14:07:14Z

+  // ownership race. Client sends ?cid=<uuid>; server reuses it across
+  // SSE reconnects so the clientId (connectionId:sessionId) never changes.
+  const clientCid = req.query.cid as string | undefined;
+  const isStableCid = clientCid != null && /^cid-[0-9a-f-]{36}$/i.test(clientCid);


🟡 bugs: The regex /^cid-[0-9a-f-]{36}$/i allows malformed UUIDs (e.g. cid------------------------------------- — 36 dashes) because [0-9a-f-]{36} doesn't enforce UUID structure. A crafted cid query param could pass validation while not being a real UUID. Impact is low since the cid is only used as a map key, but consider a stricter pattern like /^cid-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i. [fixable]

dimakis · 2026-06-26T14:07:14Z

+    const found = registry.findBySessionId(sessionId);
+    if (!found) continue;
+    // Skip if another connection has taken over this session
+    const ownerCid = found.clientId.substring(0, found.clientId.indexOf(':'));


🔵 style: getOwnerConnection() is already exported from ws-handler-v2.ts and does this exact logic (clientId.slice(0, colonIdx)). The onExpired callback reimplements it with substring(0, indexOf(':')). Import and reuse the existing function to avoid divergence. [fixable]

dimakis · 2026-06-26T14:07:14Z

+// When a stable-cid connection's inactive TTL expires (no reconnect within
+// 60s), detach any sessions it was driving — same logic as the SSE close
+// handler for legacy connections.
+connRegistry.setOnExpired((connectionId, watchedSessions) => {


🔵 missing_tests: The setOnExpired callback in index.ts (lines 112-139) has no integration test. It contains ownership checks, suspended-session guards, and detach logic — all of which are only tested indirectly via the ConnectionRegistry unit tests. A test covering the wiring (TTL expires → sessions actually detached) would catch regressions in the glue code.

dimakis · 2026-06-26T14:07:14Z

        body: JSON.stringify(body),
      });
+      if (!res.ok && endpoint === 'send') {
+        if (res.status === 404 || res.status === 429 || res.status >= 500) {


🔵 unsafe_assumptions: 404 is treated as retryable alongside 429 and 5xx. A 404 on the send endpoint typically means the session or route doesn't exist — retrying won't help. Consider treating 404 as non-retryable (emit _send_failed with willRetry: false) to avoid wasting 3 retry cycles on a permanently missing resource. [fixable]

…ndling, regex - store: clear SET_RUNNING when _send_failed fires with willRetry:false (prevents stuck "thinking" spinner on permanent failure) - sse-connection: treat 404 as permanent failure (no retry), separate from transient 429/5xx retries - server: strict UUID regex, import getOwnerConnection() from ws-handler-v2 instead of reimplementing - tests: 404 permanent failure, 4xx silently swallowed (400/403) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis · 2026-06-26T14:19:21Z

Centaur Review

Found 7 issue(s) (3 warning).

`packages/client/src/sse-connection.ts`

Solid architecture for stable connectionId and POST failure surfacing, but the retry mechanism only triggers on SSE reconnect — if SSE stays healthy while POSTs fail, the spinner gets stuck and retries never fire.

🟡 bugs (L366): Failed sends are re-queued into pendingSends, but flushPendingSends() is only called during SSE reconnect (welcome handler). If the SSE stream stays healthy while the POST endpoint returns 5xx (transient API issue), retries never fire — the message sits in the queue indefinitely. The user sees "retrying on reconnect..." but no retry actually happens until an SSE disconnect/reconnect cycle. Consider adding a delayed retry timer (e.g., exponential backoff) for the connected-but-POST-failed case. [fixable]
🔵 style (L138): getConnectionId() still returns string | null even though _connectionId is now always a string (initialized in the constructor, never set to null). The return type could be tightened to string, though this would require updating the ChatConnection interface — so may be intentionally left for backward compat. [fixable]

`packages/client/src/store.ts`

Solid architecture for stable connectionId and POST failure surfacing, but the retry mechanism only triggers on SSE reconnect — if SSE stays healthy while POSTs fail, the spinner gets stuck and retries never fire.

🟡 bugs (L717): sendError is set to "Message delivery delayed — retrying on reconnect..." on transient failure, but never cleared when the retry eventually succeeds. sendError is only reset to null when the user sends the next message (line 345). If the retry succeeds silently, the error banner persists until the user takes another action. Consider clearing sendError in the message_start handler or when SET_RUNNING transitions from server events. [fixable]
🟡 bugs (L720): When _send_failed arrives with willRetry: true, SET_RUNNING is NOT reset — the UI stays in "thinking" state while the message sits in the pending retry queue. Combined with the issue that retries only fire on SSE reconnect, the spinner can be stuck indefinitely if SSE stays connected. The user has no way to interact (the input is blocked while running is true). Consider resetting SET_RUNNING to false on willRetry: true as well, and re-dispatching SET_RUNNING: true only when the retry actually fires. [fixable]

`server/index.ts`

Solid architecture for stable connectionId and POST failure surfacing, but the retry mechanism only triggers on SSE reconnect — if SSE stays healthy while POSTs fail, the spinner gets stuck and retries never fire.

🔵 unsafe_assumptions (L416): The stable CID regex /^cid-[0-9a-f]{8}-...$/i only accepts the cid- prefix with UUID format. If a future client implementation changes the ID format (or a non-browser client uses a different scheme), it silently falls back to a server-generated conn- ID. This is safe but could be confusing to debug. A log line when clientCid is provided but rejected by the regex would help diagnose integration issues. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid architecture for stable connectionId and POST failure surfacing, but the retry mechanism only triggers on SSE reconnect — if SSE stays healthy while POSTs fail, the spinner gets stuck and retries never fire.

🔵 missing_tests: No test covers the scenario where a POST fails while SSE is connected (no reconnect), verifying that the retry sits in pendingSends and is eventually flushed on the next reconnect. The existing 're-queues failed send for retry on reconnect' test forces a reconnect immediately — it doesn't prove the message survives a period of connectivity. [fixable]
🔵 missing_tests: No test verifies the sendError lifecycle in the store when retry succeeds: set on _send_failed with willRetry: true, then cleared when the server eventually responds. This is a store-level test, but it validates the end-to-end UX contract. [fixable]

dimakis · 2026-06-26T14:27:50Z

Centaur Review

Found 6 issue(s) (3 warning).

`packages/client/src/store.ts`

Well-structured transport resilience feature with solid test coverage at the connection layer. Main gaps are missing store-level tests for the new _send_failed handler and optimistic running state, a missing barrel export for SendFailedEvent, and a potential message ordering issue in flushPendingSends.

🟡 missing_tests (L714): The store-level _send_failed handler (lines 714-725) and the optimistic SET_RUNNING dispatch (lines 379-381) have no unit tests. The _send_failed event is only tested at the SseConnection level. Store behavior — particularly the conditional SET_RUNNING: false when willRetry is false — should be validated by store tests. [fixable]
🔵 style (L377): The comment says "the send-timeout safety net will reset on failure" but the safety-net timer (line 355) only runs when wasRunning is true — it doesn't cover fresh sends. For a fresh send that fails transiently (willRetry: true), the spinner stays on until either the retry succeeds or all retries are exhausted. The comment should clarify this: the safety net is the _send_failed { willRetry: false } event, not the timer. [fixable]

`packages/client/src/index.ts`

Well-structured transport resilience feature with solid test coverage at the connection layer. Main gaps are missing store-level tests for the new _send_failed handler and optimistic running state, a missing barrel export for SendFailedEvent, and a potential message ordering issue in flushPendingSends.

🟡 regressions (L74): SendFailedEvent is exported from sse-connection.ts (line 42) but is NOT re-exported from the @mitzo/client barrel (index.ts). Consumers who need to type-narrow the listener callback can't import it without reaching into the internal module path, breaking the package's public API contract. [fixable]

`server/ws-handler-v2.ts`

Well-structured transport resilience feature with solid test coverage at the connection layer. Main gaps are missing store-level tests for the new _send_failed handler and optimistic running state, a missing barrel export for SendFailedEvent, and a potential message ordering issue in flushPendingSends.

🔵 style (L88): The JSDoc on getOwnerConnection says "ConnectionIds use the format conn-${timestamp}-${random} (see server/index.ts L196)". With stable cids the format is now also cid-${uuid}. The function works correctly (neither format contains colons), but the comment is misleading and the line reference is stale. [fixable]

`server/index.ts`

Well-structured transport resilience feature with solid test coverage at the connection layer. Main gaps are missing store-level tests for the new _send_failed handler and optimistic running state, a missing barrel export for SendFailedEvent, and a potential message ordering issue in flushPendingSends.

🔵 unsafe_assumptions (L416): The cid regex /^cid-[0-9a-f]{8}-...$/i validates UUID format strictly, which is good. However, this means a legitimate client that uses a non-standard UUID generator (e.g., shorter random string) will silently fall back to a server-generated conn- id, losing stable-cid benefits with no error or log line. Consider logging when a cid param is present but doesn't match the pattern, to aid debugging. [fixable]

`packages/client/src/sse-connection.ts`

Well-structured transport resilience feature with solid test coverage at the connection layer. Main gaps are missing store-level tests for the new _send_failed handler and optimistic running state, a missing barrel export for SendFailedEvent, and a potential message ordering issue in flushPendingSends.

🟡 bugs (L398): flushPendingSends() fires all queued doPost() calls without awaiting — they execute concurrently. If multiple user messages were queued during a disconnect, they may arrive at the server out of order. Since doPost is async and HTTP/2 multiplexes requests, delivery order is not guaranteed. For the send endpoint (which starts agent work), misordered prompts could produce confusing behavior. [fixable]

dimakis

Centaur Review

Found 7 issue(s) (3 warning).

`packages/client/src/store.ts`

Well-structured transport resilience feature with solid test coverage. Main concern is the sendError state not being cleared when a retried send succeeds — users may see a stale error banner. The SSE-level implementation (stable cid, retry queue, connection registry TTL) is sound.

🟡 bugs (L717): When a retried send eventually succeeds (e.g. willRetry: true followed by successful doPost on reconnect), sendError is never cleared. The user sees "Message delivery delayed — retrying on reconnect..." indefinitely until they send another message (which clears it at line 345). Consider clearing sendError on the _open synthetic event emitted after successful reconnect. [fixable]
🟡 bugs (L380): The optimistic SET_RUNNING fires on send() returning true, but send() returns true even when the message is just queued (not yet delivered). If the POST then fails with willRetry: true, the spinner stays showing because SET_RUNNING: false is only dispatched for willRetry: false. This is probably intentional (server events will drive running state on success), but if all retries also fail, there's a window where running is true with no active agent — the user sees an eternal spinner until the final willRetry: false event arrives after 3 reconnect cycles.
🔵 style (L715): Redundant type cast: (msg as Record<string, unknown>).willRetry — msg is already typed as Record<string, unknown> from the function parameter on line 704. Just use msg.willRetry !== false. [fixable]
🔵 missing_tests (L714): The store-level _send_failed handler and the optimistic SET_RUNNING on successful send() are new behaviors without store-level unit tests. The SSE connection tests cover the transport layer well, but the Zustand store's reaction to these events (sendError state, running state transitions) is untested. [fixable]

`server/index.ts`

Well-structured transport resilience feature with solid test coverage. Main concern is the sendError state not being cleared when a retried send succeeds — users may see a stale error banner. The SSE-level implementation (stable cid, retry queue, connection registry TTL) is sound.

🟡 unsafe_assumptions (L416): The cid regex /^cid-[0-9a-f]{8}-...$/i validates only the exact format cid-{uuid}. If the client code or a future browser produces a non-standard UUID format (e.g., uppercase from a polyfill, or a non-v4 UUID), the cid silently falls back to a server-generated one, breaking the stable-cid reconnect flow with no error signal. The case-insensitive /i flag partially mitigates this, but consider logging when a non-null cid query param fails validation, so the mismatch is diagnosable. [fixable]
🔵 regressions (L456): In the stable-cid close handler, registry.removeObserver(transport) is called immediately but session detach is deferred via markInactive. During the 60s grace period, if the agent emits events, they won't reach this transport (removed as observer) but the connection entry still exists in connRegistry. This is likely fine because the connection's transport isOpen() returns false, and the periodic sync in ConnectionRegistry filters by isOpen() — but it's worth noting the transport is stale until reconnect swaps it.

`packages/client/src/tests/sse-connection.test.ts`

Well-structured transport resilience feature with solid test coverage. Main concern is the sendError state not being cleared when a retried send succeeds — users may see a stale error banner. The SSE-level implementation (stable cid, retry queue, connection registry TTL) is sound.

🔵 missing_tests: No test for concurrent pending sends during flush. When flushPendingSends fires multiple doPost calls without awaiting, and several fail, they each push back to pendingSends. A test verifying that multiple concurrent failures during flush correctly accumulate in the retry queue (and don't exceed MAX_SEND_RETRIES per message) would increase confidence. [fixable]

dimakis · 2026-06-26T14:31:43Z

+    if (msg.type === '_send_failed') {
+      const willRetry = (msg as Record<string, unknown>).willRetry !== false;
+      store.setState((s) => ({
+        sendError: willRetry


🟡 bugs: When a retried send eventually succeeds (e.g. willRetry: true followed by successful doPost on reconnect), sendError is never cleared. The user sees "Message delivery delayed — retrying on reconnect..." indefinitely until they send another message (which clears it at line 345). Consider clearing sendError on the _open synthetic event emitted after successful reconnect. [fixable]

dimakis · 2026-06-26T14:31:43Z

+          // Optimistic: show "thinking" immediately. The server will confirm
+          // via events, or the send-timeout safety net will reset on failure.
+          set((s) => ({
+            messages: messagesReducer(s.messages, { type: 'SET_RUNNING', running: true }),


🟡 bugs: The optimistic SET_RUNNING fires on send() returning true, but send() returns true even when the message is just queued (not yet delivered). If the POST then fails with willRetry: true, the spinner stays showing because SET_RUNNING: false is only dispatched for willRetry: false. This is probably intentional (server events will drive running state on success), but if all retries also fail, there's a window where running is true with no active agent — the user sees an eternal spinner until the final willRetry: false event arrives after 3 reconnect cycles.

dimakis · 2026-06-26T14:31:43Z

    }

+    if (msg.type === '_send_failed') {
+      const willRetry = (msg as Record<string, unknown>).willRetry !== false;


🔵 style: Redundant type cast: (msg as Record<string, unknown>).willRetry — msg is already typed as Record<string, unknown> from the function parameter on line 704. Just use msg.willRetry !== false. [fixable]

dimakis · 2026-06-26T14:31:43Z

      return;
    }

+    if (msg.type === '_send_failed') {


🔵 missing_tests: The store-level _send_failed handler and the optimistic SET_RUNNING on successful send() are new behaviors without store-level unit tests. The SSE connection tests cover the transport layer well, but the Zustand store's reaction to these events (sendError state, running state transitions) is untested. [fixable]

dimakis · 2026-06-26T14:31:43Z

+  const clientCid = req.query.cid as string | undefined;
+  const isStableCid =
+    clientCid != null &&
+    /^cid-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(clientCid);


🟡 unsafe_assumptions: The cid regex /^cid-[0-9a-f]{8}-...$/i validates only the exact format cid-{uuid}. If the client code or a future browser produces a non-standard UUID format (e.g., uppercase from a polyfill, or a non-v4 UUID), the cid silently falls back to a server-generated one, breaking the stable-cid reconnect flow with no error signal. The case-insensitive /i flag partially mitigates this, but consider logging when a non-null cid query param fails validation, so the mismatch is diagnosable. [fixable]

dimakis · 2026-06-26T14:31:43Z

+      // connectionId — markInactive starts a 60s TTL. If the TTL expires,
+      // the onExpired callback triggers session detach.
+      connRegistry.markInactive(connectionId);
+      registry.removeObserver(transport);


🔵 regressions: In the stable-cid close handler, registry.removeObserver(transport) is called immediately but session detach is deferred via markInactive. During the 60s grace period, if the agent emits events, they won't reach this transport (removed as observer) but the connection entry still exists in connRegistry. This is likely fine because the connection's transport isOpen() returns false, and the periodic sync in ConnectionRegistry filters by isOpen() — but it's worth noting the transport is stale until reconnect swaps it.

… logging - Clear sendError on _open (reconnect success) to prevent stale error banners - Remove redundant type cast on msg.willRetry in _send_failed handler - Log warning when client sends invalid cid format on SSE connect Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis

Centaur Review

Found 6 issue(s) (2 warning).

`packages/client/src/sse-connection.ts`

Solid transport resilience feature with good test coverage. Two actionable issues: (1) 400 responses on send silently leave the UI stuck in 'thinking' state due to interaction with the new optimistic SET_RUNNING, and (2) sessionStorage access at module scope can throw in private browsing mode.

🟡 bugs (L362): When a send POST returns 400 (bad request), it falls through silently — no _send_failed emitted, no retry. Combined with the new optimistic SET_RUNNING(true) in store.ts:380, this leaves the UI stuck in 'thinking' state indefinitely. The comment references a 'send-timeout safety net' but no such timer exists for this code path. Either emit _send_failed with willRetry: false for all non-2xx send responses that don't already have handling, or treat 400 the same as 404 (permanent failure). [fixable]
🔵 style (L43): SendFailedEvent.clientMsgId is typed as unknown but is always a string (generated by the store as user-${Date.now()}-...). Using string would improve type safety in the store handler at store.ts:722 where it's consumed without narrowing. [fixable]
🔵 regressions (L113): The send() guard changed from this._connected && this._connectionId to just this._connected. Previously, _connectionId being null was an extra safety check that send wouldn't fire before the welcome handshake. Now _connectionId is always set (from constructor), so the guard change is semantically correct — but doPost will now include the client-generated cid in X-Connection-ID even if the server hasn't acknowledged it yet. This is safe because the _connected flag is still gated on welcome receipt, but worth noting.

`frontend/src/client-store.ts`

Solid transport resilience feature with good test coverage. Two actionable issues: (1) 400 responses on send silently leave the UI stuck in 'thinking' state due to interaction with the new optimistic SET_RUNNING, and (2) sessionStorage access at module scope can throw in private browsing mode.

🟡 unsafe_assumptions (L31): sessionStorage.getItem() and setItem() can throw SecurityError in Safari private browsing, some Firefox configurations, and restrictive WebViews. Since getOrCreateConnectionId() is called at module scope (during SSE config construction), an exception here would break the entire client import. Wrap in try-catch with a fallback to an in-memory UUID. [fixable]

`packages/client/src/tests/sse-connection.test.ts`

Solid transport resilience feature with good test coverage. Two actionable issues: (1) 400 responses on send silently leave the UI stuck in 'thinking' state due to interaction with the new optimistic SET_RUNNING, and (2) sessionStorage access at module scope can throw in private browsing mode.

🔵 missing_tests: No test verifies behavior when the same message is re-queued and then the connection succeeds — specifically, that retries is correctly incremented through multiple flush cycles and that clientMsgId is preserved across retries (not duplicated in the queue). [fixable]

`server/index.ts`

Solid transport resilience feature with good test coverage. Two actionable issues: (1) 400 responses on send silently leave the UI stuck in 'thinking' state due to interaction with the new optimistic SET_RUNNING, and (2) sessionStorage access at module scope can throw in private browsing mode.

🔵 unsafe_assumptions (L416): The cid validation regex /^cid-[0-9a-f]{8}-...$/i correctly matches UUID v4, but crypto.randomUUID() is spec'd to return lowercase. The i flag is fine for robustness, but note that if a non-UUID-shaped cid is rejected, the server silently falls back to a server-generated conn- id with no feedback to the client. The client's _connectionId will be its UUID, while the server uses a different id — the welcome echo will trigger the 'old server' fallback path in the client, which works but means the 'stable cid' feature silently degrades. A log.warn exists but the client has no way to know its cid was rejected.

dimakis · 2026-06-26T14:41:07Z

        },
        body: JSON.stringify(body),
      });
+      if (!res.ok && endpoint === 'send') {


🟡 bugs: When a send POST returns 400 (bad request), it falls through silently — no _send_failed emitted, no retry. Combined with the new optimistic SET_RUNNING(true) in store.ts:380, this leaves the UI stuck in 'thinking' state indefinitely. The comment references a 'send-timeout safety net' but no such timer exists for this code path. Either emit _send_failed with willRetry: false for all non-2xx send responses that don't already have handling, or treat 400 the same as 404 (permanent failure). [fixable]

dimakis · 2026-06-26T14:41:07Z

+
+/** Synthetic event emitted when a POST to the send endpoint fails. */
+export interface SendFailedEvent {
+  type: '_send_failed';


🔵 style: SendFailedEvent.clientMsgId is typed as unknown but is always a string (generated by the store as user-${Date.now()}-...). Using string would improve type safety in the store handler at store.ts:722 where it's consumed without narrowing. [fixable]

dimakis · 2026-06-26T14:41:07Z

    if (!endpoint) return false;

-    if (this._connected && this._connectionId) {
+    if (this._connected) {


🔵 regressions: The send() guard changed from this._connected && this._connectionId to just this._connected. Previously, _connectionId being null was an extra safety check that send wouldn't fire before the welcome handshake. Now _connectionId is always set (from constructor), so the guard change is semantically correct — but doPost will now include the client-generated cid in X-Connection-ID even if the server hasn't acknowledged it yet. This is safe because the _connected flag is still gated on welcome receipt, but worth noting.

dimakis · 2026-06-26T14:41:07Z

+/** Stable connectionId per browser tab — survives SSE reconnects. */
+function getOrCreateConnectionId(): string {
+  const KEY = 'mitzo:connectionId';
+  let cid = sessionStorage.getItem(KEY);


🟡 unsafe_assumptions: sessionStorage.getItem() and setItem() can throw SecurityError in Safari private browsing, some Firefox configurations, and restrictive WebViews. Since getOrCreateConnectionId() is called at module scope (during SSE config construction), an exception here would break the entire client import. Wrap in try-catch with a fallback to an in-memory UUID. [fixable]

dimakis · 2026-06-26T14:41:07Z

+  const clientCid = req.query.cid as string | undefined;
+  const isStableCid =
+    clientCid != null &&
+    /^cid-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(clientCid);


🔵 unsafe_assumptions: The cid validation regex /^cid-[0-9a-f]{8}-...$/i correctly matches UUID v4, but crypto.randomUUID() is spec'd to return lowercase. The i flag is fine for robustness, but note that if a non-UUID-shaped cid is rejected, the server silently falls back to a server-generated conn- id with no feedback to the client. The client's _connectionId will be its UUID, while the server uses a different id — the welcome echo will trigger the 'old server' fallback path in the client, which works but means the 'stable cid' feature silently degrades. A log.warn exists but the client has no way to know its cid was rejected.

dimakis · 2026-06-26T14:50:39Z

Centaur Review

Found 4 issue(s) (1 critical) (2 warning).

`server/index.ts`

The stable-cid and POST retry features are well-designed, but there is a critical race condition in the server's SSE close handler: when a new request replaces an old one with the same connectionId, the old request's async close handler removes the new response from the SSE registry, breaking event delivery and eventually triggering session detach.

🔴 bugs (L452): Race condition on same-cid SSE reconnect: when a new request arrives with the same stable cid, chatSseRegistry.add() calls existing.end() on the old response, which asynchronously fires the old request's close handler. That handler unconditionally calls chatSseRegistry.remove(connectionId), which removes the NEW response from the registry (since they share the same key). This breaks event delivery — the server can no longer send SSE events to the reconnected client. Additionally, connRegistry.markInactive() starts a 60s TTL; since transport.isOpen() delegates to sseRegistry.isOpen() which checks the now-missing registry entry, the TTL fires and detaches sessions. Fix: guard the close handler with a check that the current stream in the registry is still res (this response) before removing, or add a removeIfCurrent(connectionId, res) method to SessionSseRegistry. [fixable]
🟡 missing_tests (L407): No integration test covers the server-side SSE reconnect flow where a new request with the same stable cid arrives while the old request's close handler is pending. This is where the race condition in the previous finding lives. The existing tests cover ConnectionRegistry.reconnect() and client-side reconnection behavior, but not the interplay between chatSseRegistry.add() ending the old response and the old req.on('close') handler firing afterward. [fixable]

`packages/client/src/sse-connection.ts`

The stable-cid and POST retry features are well-designed, but there is a critical race condition in the server's SSE close handler: when a new request replaces an old one with the same connectionId, the old request's async close handler removes the new response from the SSE registry, breaking event delivery and eventually triggering session detach.

🟡 unsafe_assumptions (L367): Failed send POSTs are queued in pendingSends for retry, but flushPendingSends() is only called after a reconnect (welcome event). If the SSE stream stays healthy while a POST fails with 5xx, the message sits in the queue indefinitely — the user sees "retrying on reconnect..." but no reconnect occurs. Consider adding a timer-based retry (e.g., exponential backoff within doPost) for cases where the SSE stream is alive but the POST endpoint is temporarily unavailable. [fixable]
🔵 style (L138): _connectionId is now always initialized to a string in the constructor (never null), but getConnectionId() still returns string | null per the ChatConnection interface. The interface jsdoc "Server-assigned connection ID (null before welcome)" is also outdated — the ID is now client-generated at construction time. Consider updating the interface and jsdoc to reflect the new semantics. [fixable]

… timer - Fix critical race: stale close handler removing new SSE stream. Added removeIfCurrent() to SessionSseRegistry — only removes if the response being closed is still the current entry for that connectionId - Add timer-based retry flush (3s) for failed POSTs when SSE stays healthy, so messages don't sit in the queue waiting for a reconnect that never comes - Update ChatConnection jsdoc — connectionId is now client-generated - Clean up retry timer on disconnect Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis