Skip to content

consomme: remove tcp connections in closing state after timeout#3600

Open
Brian-Perkins wants to merge 1 commit into
microsoft:mainfrom
Brian-Perkins:consomme_tcp_close_handling
Open

consomme: remove tcp connections in closing state after timeout#3600
Brian-Perkins wants to merge 1 commit into
microsoft:mainfrom
Brian-Perkins:consomme_tcp_close_handling

Conversation

@Brian-Perkins

Copy link
Copy Markdown
Contributor

When a socket is closed, there is a small period of time before it can be reused (per RFC 793) to make sure any old, outstanding packets are properly disposed of instead of mistakenly being given to a newly opened connection. Update the TODOs in the code to actually implement this. In addition, time out any socket that has been partially closed where the handshake is not completed by the guest. This keeps the guest from being able to leak half-closed sockets indefinitely.

@Brian-Perkins Brian-Perkins requested a review from a team as a code owner May 29, 2026 21:34
Copilot AI review requested due to automatic review settings May 29, 2026 21:34
@github-actions github-actions Bot added the unsafe Related to unsafe code label May 29, 2026
@github-actions

Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements TCP close-state cleanup in consomme so connections sitting in FinWait1/FinWait2/Closing/LastAck/TimeWait are forcibly reaped after a configurable timeout (default 60s ≈ 2·MSL per RFC 793), addressing the previous TODOs and preventing guests from leaking half-closed connections.

Changes:

  • Added ConsommeParams::tcp_close_timeout (default 60s) and renamed the local timeout binding in Consomme::new_inner to udp_timeout for clarity.
  • Added TcpConnectionInner::close_deadline armed on entry to any half-closed state (including on each received segment in TimeWait) and a new connections_closed_timeout counter; poll_tcp reaps connections whose deadline has expired, counting TimeWait as a normal close and the others as timeout closes.
  • Added four new tests in tcp/tests.rs covering TimeWait, FinWait1/FinWait2, Closing, and LastAck cleanup paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
vm/devices/net/net_consomme/consomme/src/lib.rs Adds tcp_close_timeout param with 2·MSL default; renames local timeout to udp_timeout.
vm/devices/net/net_consomme/consomme/src/tcp.rs Tracks per-connection close_deadline, arms it on half-close transitions, reaps expired connections in poll_tcp, adds timeout counter.
vm/devices/net/net_consomme/consomme/src/tcp/tests.rs Adds tests verifying reap and stat accounting for TimeWait, FinWait1/FinWait2, Closing, and LastAck.

TcpState::FinWait1
| TcpState::FinWait2
| TcpState::Closing
| TcpState::LastAck => self.inner.tcp.aggregate_stats.record_timeout_close(),

@jstarks jstarks May 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure about closing from these states?

I think for these states we really should be retransmitting, not terminating. And so maybe that should be saved for a separate change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially FinWait2 seems bad to terminate from--it's fine to stay in FinWait2 indefinitely, I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we are supposed to transmit again to make sure these state transition are seen, so it is basically a question of whether we want to make this generic, or assume a non-lossy environment. I was simplifying by assuming non-lossy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants