consomme: autotune per-connection TCP ring buffers#3597
Open
benhillis wants to merge 1 commit into
Open
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
Member
Author
|
Oops this was supposed to be an RFC / draft. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds Linux-style TCP autotuning to the Consomme user-mode TCP stack by replacing fixed per-connection TX/RX ring sizes with bounded growth (initial → max) and resizing rings in-place based on observed buffer pressure, while keeping the default behavior equivalent to the prior fixed 256 KiB buffers.
Changes:
- Introduces
TcpBufferBounds { initial, max }onConsommeParamsand wires those bounds into TCP connection initialization. - Adds
Ring::resize()to grow ring buffers while preserving the in-order[head, tail)byte range, plus unit tests for resize behavior. - Implements per-connection autotune in the socket backend poll loop and adds inspect counters for TX/RX buffer growth.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| vm/devices/net/net_consomme/consomme/src/tcp/ring.rs | Adds Ring::resize() and new unit tests covering resize across wrap/no-wrap scenarios. |
| vm/devices/net/net_consomme/consomme/src/tcp/assembler.rs | Makes Assembler::is_empty() available outside tests to gate safe RX ring growth. |
| vm/devices/net/net_consomme/consomme/src/tcp.rs | Plumbs buffer bounds into connection creation, derives window scale from RX max, and grows TX/RX rings opportunistically with new inspect counters. |
| vm/devices/net/net_consomme/consomme/src/lib.rs | Exposes TCP RX/TX buffer bounds on ConsommeParams with defaults preserving prior fixed sizing. |
benhillis
added a commit
to benhillis/openvmm
that referenced
this pull request
Jun 3, 2026
Address review feedback on microsoft#3597: - Clarify that the rx ring can round above rx_buffer_max (power-of-two) and that TcpBufferBounds is capped by u16::MAX without window scaling. - Add tests covering buffer-bounds normalization, that the derived rx window scale reaches max, and that the tx ring autotunes (grows and caps at max) under host-side backpressure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
benhillis
added a commit
to benhillis/openvmm
that referenced
this pull request
Jun 3, 2026
Address review feedback on microsoft#3597: note the new per-connection ring buffer autotune defaults (16 KiB initial, 4 MiB max) and how embedders override them via tcp_rx_buffer/tcp_tx_buffer on ConsommeParams. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
benhillis
added a commit
to benhillis/openvmm
that referenced
this pull request
Jun 6, 2026
Address review feedback on microsoft#3597: - Clarify that the rx ring can round above rx_buffer_max (power-of-two) and that TcpBufferBounds is capped by u16::MAX without window scaling. - Add tests covering buffer-bounds normalization, that the derived rx window scale reaches max, and that the tx ring autotunes (grows and caps at max) under host-side backpressure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
benhillis
added a commit
to benhillis/openvmm
that referenced
this pull request
Jun 6, 2026
Address review feedback on microsoft#3597: note the new per-connection ring buffer autotune defaults (16 KiB initial, 4 MiB max) and how embedders override them via tcp_rx_buffer/tcp_tx_buffer on ConsommeParams. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
33f6ea6 to
9ec22d5
Compare
Grow per-connection TCP rings in place from `initial` up to `max`
(`TcpBufferBounds`) based on observed pressure, instead of using a fixed
size. The default now autotunes (16 KiB -> 4 MiB), so idle/short
connections start cheaper than the old fixed 256 KiB while bulk flows can
ramp to a few MiB.
What changed:
- `ConsommeParams` exposes per-connection `tcp_rx_buffer`/`tcp_tx_buffer`
as `TcpBufferBounds { initial, max }` so embedders can pick bounds for
their workload. `DEFAULT_TCP_BUFFER_BOUNDS` is now 16 KiB -> 4 MiB
(was a fixed 256 KiB).
- `Ring::resize`: grow a power-of-two ring in place, preserving the
`[head, tail)` view without staging through a temporary buffer.
Grow-only (same-capacity resizes are a no-op). rx grow is gated on the
assembler + rx buffer being empty.
- rx window scale is derived from `rx.max` so the advertised window can
grow without re-negotiating at SYN. Window/ring sizing accounts for the
power-of-two rounding and the `u16::MAX` cap when window scaling is
unavailable.
- `poll_socket_backend`: double the tx ring when it fills the host read
loop (re-reading the host socket into the freshly grown ring within the
same poll), and double the rx ring + window when it hit >=75% then
drained (forcing an ACK so the peer picks up the larger window). Both
are capped at `max`. New `tx_buffer_grows`/`rx_buffer_grows` counters.
Tests:
- New `ring::resize` unit tests and end-to-end autotune tests covering
tx/rx growth, capping at `max`, and the derived rx window scale.
- Test host-write helpers poll consomme concurrently so large writes
can't deadlock when consomme is the only socket reader.
Docs:
- The Guide's consomme backend page documents the new default bounds,
autotune behavior, and tuning knobs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9ec22d5 to
200c4bc
Compare
jstarks
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Grows per-connection TCP rings in place from
initialup tomax(TcpBufferBounds) based on pressure, instead of a fixed size. The default now autotunes (16 KiB → 4 MiB), so idle/short connections start cheaper than the old fixed 256 KiB while bulk flows can ramp to a few MiB.What changed
ConsommeParamsexposes per-connectiontcp_rx_buffer/tcp_tx_bufferasTcpBufferBounds { initial, max }so embedders can pick bounds for their workload.Ring::resize: grow a power-of-two ring in place, preserving the[head, tail)view. rx grow is gated on the assembler + rx buffer being empty.rx.maxso the advertised window can grow without re-negotiating at SYN.poll_socket_backend: double the tx ring when it fills the host read loop; double the rx ring + window when it hit ≥75% then drained (forcing an ACK so the peer picks up the larger window). Capped atmax. Newtx_buffer_grows/rx_buffer_growscounters.DEFAULT_TCP_BUFFER_BOUNDSnow16 KiB → 4 MiB(was fixed 256 KiB).Testing
cargo nextest run -p consommepasses (incl. newring::resizetests);cargo clippy --release -p consomme --all-targetsclean.burette
networkbenchmark (--backend consomme --nic vmbus, N=5, warm VM), old fixed 256 KiB vs autotune 64 KiB → 4 MiB:A fixed-4 MiB control matches autotune (9.34 / 5.27), so the win is from raising the 256 KiB ceiling — autotune gets it while still starting small.