Skip to content

Skip close handshake reciprocation for reserved codes#394

Merged
threepointone merged 2 commits intomainfrom
fix/close-handshake-skip-reserved-codes
Apr 28, 2026
Merged

Skip close handshake reciprocation for reserved codes#394
threepointone merged 2 commits intomainfrom
fix/close-handshake-skip-reserved-codes

Conversation

@threepointone
Copy link
Copy Markdown
Collaborator

Summary

Followup to #393. The close-handshake fix in 0.5.4 normalized the reserved synthetic close codes (1005 NoStatusReceived, 1006 AbnormalClosure, 1015 TLSHandshake) to 1000 and reciprocated anyway, on the theory that calling ws.close(...) was always safe. In practice it isn't: those codes are precisely the runtime's signal that there was no real Close frame from the peer — the underlying transport is already gone. The reciprocating ws.close(...) succeeds synchronously but schedules an outbound write on a dead transport, which the runtime later rejects asynchronously with Network connection lost / WebSocket peer disconnected. That rejection can't be observed from closeQuietly's synchronous try/catch (the call returns void, no Promise to attach a .catch to), so it surfaces as an unhandled promise rejection in tests and production logs.

The fix is to recognize the reserved-code shape and skip the reciprocation entirely. There is no peer to acknowledge to. Both the hibernating webSocketClose path and the non-hibernating #attachSocketEventHandlers close listener pick this up via the shared closeQuietly helper.

How it surfaced

Cloudflare Agents' sub-agent routing puts the WebSocket pair on opposite sides of a Durable Object RPC boundary (parent DO → facet DO). When a client closes the upgrade response immediately (a common pattern in tests), the runtime delivers webSocketClose(ws, 1005, "", true) to the facet, our reciprocation tries to write a Close frame back through an already-severed RPC link, and the agents-repo examples/assistant vitest suite reports Errors 11 on an otherwise-passing 21-test run with output like:

exception = workerd/api/web-socket.c++:821: disconnected: WebSocket peer disconnected
⎯⎯⎯⎯ Unhandled Rejection ⎯⎯⎯⎯⎯
Error: Network connection lost.
Serialized Error: { retryable: true }

Diagnosed via instrumented closeQuietly to confirm code=1005, wasClean=true, readyState=2 was the consistent shape at fault, and that downgrading the agents-repo to partyserver@0.5.3 made the rejection vanish — pinning #393 as the proximate cause.

Changes

  • packages/partyserver/src/index.ts — replace normalizeCloseCode(code) with isReservedCloseCode(code) and have closeQuietly early-return for reserved codes instead of normalizing-and-sending. Docstring updated to explain why (the synchronous try/catch can't see the asynchronous runtime rejection).
  • packages/partyserver/src/tests/index.test.ts — one new regression test under Close handshake (#389) > hibernating that fires ws.close() with no code (cleanest way to drive a code-1005 arrival on the server in the in-process test runner) and asserts (a) onClose still runs with the reserved code on the server side, and (b) the test file completes without an unhandled rejection.
  • .changeset/quiet-foxes-skip-handshake.md — patch changeset.

Why this is safe

  • The reserved-code path was never actually completing a handshake — 1005/1006/1015 are only ever synthesized locally to communicate "no close frame arrived." There is no peer state to flip from CLOSING → CLOSED on the other side.
  • Real close frames (codes 1000 / 1001 / 4xxx / etc.) continue to reciprocate via the same closeQuietly helper. All ten existing Close handshake (#389) tests still pass unchanged.
  • On compatibility dates >= 2026-04-07, the runtime's web_socket_auto_reply_to_close flag handles the close handshake before our handler ever runs, so this code path is a no-op anyway.

User-visible behavior change

A client that calls ws.close() with no code on a server running a compatibility date < 2026-04-07 (where auto-reply isn't yet active) will now observe a non-clean close instead of the previously-fabricated 1000 reciprocation. Clients that pass an explicit close code, and any client on compatibility dates >= 2026-04-07, are unaffected.

Test plan

  • npm run check:test -w partyserver73/73 pass (62 existing + 10 from Complete the WebSocket close handshake in webSocketClose #393 + 1 new regression test)
  • npm run check:type — clean
  • npm run check:lint — 0 warnings, 0 errors
  • npm run check:format — clean
  • examples/assistant repro suite in cloudflare/agents — 21/21 pass with 0 errors (was 21/21 + 11 unhandled rejection errors before this fix)
  • Confirmed the new regression test fails loudly without the source change (timeout waiting for client close event), then passes once the early-return is in place

Made-with: Cursor

Made with Cursor

Followup to #393. The close handshake fix landed in 0.5.4 normalized
the reserved synthetic codes (1005 NoStatusReceived, 1006 AbnormalClosure,
1015 TLSHandshake) to 1000 and reciprocated anyway, on the theory that
calling `ws.close(...)` was always safe. In practice it isn't: those
codes are precisely the runtime's signal that there was no real Close
frame from the peer — the underlying transport is already gone. The
reciprocating `ws.close(...)` succeeds synchronously but schedules an
outbound write on a dead transport, which the runtime later rejects
asynchronously with `Network connection lost` / `WebSocket peer
disconnected`. That rejection can't be observed from `closeQuietly`'s
synchronous try/catch (the call returns void, no Promise to attach a
`.catch` to), so it surfaces as an unhandled promise rejection in
tests and production logs.

Surfaced by Cloudflare Agents sub-agent routing, where the WebSocket
pair is tunneled across Durable Object RPC boundaries: the runtime
delivered `webSocketClose(ws, 1005, "", true)` reliably, our
reciprocation tried to write a Close frame back through an
already-severed RPC link, and vitest reported `Errors 11` on an
otherwise-passing 21-test suite.

The fix is to recognize the reserved-code shape and skip the
reciprocation entirely. There is no peer to acknowledge to. Both the
hibernating `webSocketClose` path and the non-hibernating
`#attachSocketEventHandlers` close listener pick this up via the
shared `closeQuietly` helper.

Adds one regression test under `Close handshake (#389) > hibernating`
that exercises a client `ws.close()` with no code (the cleanest way
to drive a code-1005 arrival on the server in the in-process test
runner) and asserts that `onClose` still runs with the reserved code
while the framework no longer attempts a reciprocation.

User-visible behavior change: a client that calls `ws.close()` with
no code on a server running a compatibility date `< 2026-04-07`
(where the runtime's `web_socket_auto_reply_to_close` flag isn't
yet active) will now observe a non-clean close instead of the
previously-fabricated 1000 reciprocation. Clients that pass an
explicit close code, and any client on compatibility dates
`>= 2026-04-07` (auto-reply does the work), are unaffected.

Verification:
- `npm run check:test -w partyserver`: 73/73 pass.
- `npm run check:type`, `check:lint`, `check:format`: clean.
- Repro suite in cloudflare/agents (examples/assistant): 21/21
  pass with 0 errors. Was 21/21 + 11 unhandled rejection errors
  before this fix.

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 28, 2026

🦋 Changeset detected

Latest commit: 090316c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
partyserver Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 28, 2026

Open in StackBlitz

hono-party

npm i https://pkg.pr.new/cloudflare/partykit/hono-party@394

partyfn

npm i https://pkg.pr.new/cloudflare/partykit/partyfn@394

partyserver

npm i https://pkg.pr.new/cloudflare/partykit/partyserver@394

partysocket

npm i https://pkg.pr.new/cloudflare/partykit/partysocket@394

partysub

npm i https://pkg.pr.new/cloudflare/partykit/partysub@394

partysync

npm i https://pkg.pr.new/cloudflare/partykit/partysync@394

partytracks

npm i https://pkg.pr.new/cloudflare/partykit/partytracks@394

partywhen

npm i https://pkg.pr.new/cloudflare/partykit/partywhen@394

y-partyserver

npm i https://pkg.pr.new/cloudflare/partykit/y-partyserver@394

commit: 090316c

Update partyserver dependency from ^0.5.3 to ^0.5.4 across the monorepo and refresh package-lock.json. Affected package.json files: packages/hono-party, packages/partysub, packages/partysync, packages/partywhen, packages/y-partyserver. No other functional changes.
@threepointone threepointone merged commit ec0c93d into main Apr 28, 2026
6 checks passed
@threepointone threepointone deleted the fix/close-handshake-skip-reserved-codes branch April 28, 2026 18:28
@github-actions github-actions Bot mentioned this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant