vnc: stop guest screen/pointer reporting when no client is connected#3614
vnc: stop guest screen/pointer reporting when no client is connected#3614bitranox wants to merge 9 commits into
Conversation
When no VNC client is connected, the guest still generates and sends synthvid dirty rectangles and hardware pointer updates on every screen change. The video device parses each one and forwards it to the VNC worker, which discards it because there are no clients. For a guest with active screen content and nobody watching, that is wasted guest CPU, vmbus traffic, and host allocations. Use the synthvid FeatureChange message to tell the guest to stop. The VNC worker signals the video device when the connected-client count crosses 0<->1, over a new mesh channel; the device relays that to the guest by clearing is_dirt_needed and the two pointer flags. Situation (resolution) updates stay enabled so a mode change while idle is still tracked. The device also gates its own forwarding while disabled and warns (rate limited) if a guest keeps sending updates anyway. The device defaults to the guest's enabled state and only disables when the worker signals "no client". This keeps an already-connected client working across a guest reboot or video-driver reload, which re-open the synthvid channel and reset the device's per-connection state while the worker and client persist. If the device re-handshakes while idle and returns to the enabled default, the worker self-corrects: receiving dirt with zero clients re-sends "no client", which settles in one cycle. Wiring spans the openvmm and openhcl launch paths and the petri test builder. Adds a device unit test for the gate and documents the mechanism in the VNC architecture guide.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds an “updates needed” reverse channel between the VNC worker and the synthetic video device so the guest can stop generating dirty rectangles and pointer updates while no VNC client is connected.
Changes:
- Adds
updates_needed_send/recvto VNC/video resource plumbing across openvmm and underhill. - Implements gating in the video device via synthvid
FeatureChange, plus dropping/monitoring updates when disabled. - Updates VNC multi-client server to signal first-connect/last-disconnect and to re-assert “disabled” if dirt arrives while idle; adds documentation and a new video test.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| workers/vnc_worker_defs/src/lib.rs | Extends VNC worker parameters with updates_needed_send. |
| workers/vnc_worker/src/lib.rs | Signals synthvid “updates needed” based on client presence / idle dirt. |
| vm/devices/uidevices_resources/src/lib.rs | Extends SynthVideoHandle with updates_needed_recv. |
| vm/devices/uidevices/src/video/mod.rs | Implements FeatureChange send + gates dirt/pointer updates; adds test. |
| vm/devices/uidevices/src/resolver.rs | Wires updates_needed_recv into Video::new. |
| petri/src/vm/openvmm/construct.rs | Initializes new handle field in Petri VM config. |
| openvmm/openvmm_entry/src/lib.rs | Creates and wires the reverse channel between VNC worker and device. |
| openhcl/underhill_core/src/worker.rs | Plumbs reverse channel into Underhill synth video handle. |
| openhcl/underhill_core/src/lib.rs | Refactors console setup to include both channel directions. |
| Guide/src/reference/devices/vnc/what-we-are-proud-of.md | Documents idle suppression optimization. |
| Guide/src/reference/devices/vnc/architecture.md | Documents the new reverse channel + behavior details. |
|
@jstarks one thing to flag before this merges: your standalone client will stop receiving dirty frames. The suppression keys off the VNC worker's own connected-client count. The worker has no knowledge of your standalone client, so as soon as it has no clients of its own, it tells the guest to stop reporting dirty rectangles, and your client gets starved. If you want to keep it working, I can add an opt-out flag, something like The cleaner long-term answer probably depends on how we want to handle that second client. The device already owns the enable/disable decision and can aggregate demand from more than one source, so the standalone client could register its own "needs dirt" signal instead of relying on a global flag. Which direction do you prefer? And do you actually want to go down this path, or would you rather hold off? yours sincerely Robert |
Addresses review feedback on the idle-suppression change. - Add an e2e test in vnc_worker for the coordinator's updates-needed signaling: startup signals false, the first client connect signals true, the last disconnect signals false, and dirt arriving while idle re-asserts false (the self-heal path). - Add a comment at the video device select! explaining why cancelling the in-flight guest packet read is safe (message-mode MessagePipe::recv commits the ring read offset only on a complete read, so a dropped pending read leaves the stream untouched) and why the loop must wait on the packet and the demand channel concurrently.
|
@jstarks : One thing worth stating explicitly, because it makes the lossy ring the right fit: dirty rectangles are only hints. The real pixels are in VRAM, so dropping rects is always safe and recovered by a full refresh, never a correctness problem. That fall-behind path is the same missed_dirty behavior the VNC server already uses, just moved to the cross-process boundary. The vmbus ring looks like the natural in-tree precedent for the shared section and the release/acquire cursor. The one thing I'd watch in the design: keep the "broadcast the latest offset" notification bounded too (coalesce to the newest offset, or a shared atomic cursor plus a wakeup event), so the unbounded-queue problem doesn't just move from the data to the notifications. It also composes cleanly with the idle suppression in this PR: the "updates needed" demand generalizes from "the VNC worker has a client" to "any consumer is active" (the device already owns that decision and can OR multiple consumers). Once your standalone GUI is a first-class consumer through the ring, the --keep-dirty-rects-enabled stopgap I floated earlier isn't needed. I take this on as a follow-up, I've noted it as the top item on our side. yours sincerely |
|
@smalis-msft - hopefully You can catch the time to review it, then I can implement the changes @jstarks proposed |
|
@jstarks - steering needed Coming back to this after sitting with it. I still think the shared-memory ring is a clean design, and the multi-consumer problem probably needs solving if You really need that standalone GUI, but I want to make sure we build the right amount of machinery. Here is where my thinking landed and where I'd value your steer. If the multi-consumer need is real: mesh channels are single-consumer (the receiver isn't cloneable and there's no broadcast primitive), so the current dirt_send/dirty_recv can only feed one consumer, and your standalone GUI can't coexist with the VNC server on it. On the unbounded-memory point I'm less sure it's load-bearing. The coordinator drains the channel continuously and the per-item work is tiny, so in practice it stays near-empty. The idle-suppression change in this PR makes it less likely still, because the guest only produces dirt while a consumer is actually watching and draining. I haven't been able to make it grow. So I'd treat the memory bound as a nice property rather than a driver. If multi-consumer is the real goal, there's a simpler option than a shared section: have the device hold one mesh sender per consumer and broadcast Arc<Vec> to each. That is the same Arc fan-out the VNC worker already does for its own clients, just lifted cross-process. No shared memory or cross-process atomics, and it keeps the VNC crates' forbid(unsafe_code). Consumers register a sender with the device, and the per-consumer demand aggregation for the idle suppression is the same work either way. The ring's real advantages (one buffer for many readers, a hard memory bound, cross-process zero-copy) mostly pay off at many consumers or under genuine pressure. At two consumers they're marginal, and the rects are small enough that the copy is free. Against that, the ring adds shared-memory atomics with explicit ordering, torn-read detection, a bounded wakeup path, and a fair amount of concurrency testing, which is real surface to get right. So my questions:
It's your call given you own both the codebase and the GUI, but right now I'd lean toward the simple broadcast: the device fans out to each consumer, with the GUI as another consumer alongside the VNC worker rather than behind it. see #3628 for actual measurements Robert |
The dirty-rect receiver and the updates-needed sender are wired together or not at all: both present when a synth video device exists, both absent otherwise. They were two separate Option fields, which allowed an invalid "one Some, one None" state. Move them into a single SynthVideoChannels struct carried as one Option through VncParameters, the worker, and the openvmm_entry plumbing, so the mismatch is unrepresentable. The worker test no longer constructs the mismatched state either.
wait_for_signal and wait_for_input polled with a fixed deadline and panicked on timeout, which just adds flakiness. Block on recv.recv() (via pal_async::local::block_on) and let nextest's slow-test timeout fail a genuinely wedged test. A closed channel still panics, since that is a real failure rather than a timeout.
SynthVideoHandle carried dirt_send and updates_needed_recv as two separate Options, which allowed the same mismatched "one Some, one None" state (the petri handle set both to None by hand). Bundle them into a SynthVideoChannels struct behind one Option, unpacking at the resolver into the device's existing fields. Name the worker's match bindings channels to match.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
openhcl/underhill_core/src/lib.rs:1
VncParameterswas changed to carrysynth_video: Option<SynthVideoChannels>(bundling the dirty receiver + updates-needed sender). This call site still populates the old shape (dirty_recvplus a separateupdates_needed_send), which won’t compile. Createsynth_video: Option<vnc_worker_defs::SynthVideoChannels>fromdirty_rect_recvandupdates_needed_sendand pass it via thesynth_videofield.
// Copyright (c) Microsoft Corporation.
…e device type distinctly Two follow-ups to the device-side bundle (Copilot review): underhill_core (the OpenHCL paravisor) constructs both SynthVideoHandle and VncParameters and was missed, so wire both through the bundled channels/synth_video fields, built only when both endpoints are present. And rename the device-side bundle to SynthVideoDeviceChannels so it is distinct from the worker-side SynthVideoChannels at call sites that wire both ends.
The bundle refactor moved the dirty-rect receiver and updates-needed sender into synth_video: Option<SynthVideoChannels> (and the device end into SynthVideoHandle.channels). Update the architecture reference, which still described the old un-bundled fields.
Match the worker's dirty_recv and the DirtyRect the channel carries; the field was dirt_send on the device but dirty_recv on the worker. Renamed through the device resource, resolver, openvmm_entry, and the openhcl paravisor.
The idle self-heal re-asserts false on every dirty event the guest sends while no client is connected (needed because the device defaults to enabled after a re-handshake, so a naive dedupe would break it). A non-compliant guest could spam the debug log, so wrap it in tracelimit::event_ratelimited! at DEBUG level. The send is a cheap bool the device dedupes; this rate-limits the log, not the self-heal.
|
merged into #3665 |
Summary
When no VNC client is connected, the guest still generates and sends synthvid
dirty rectangles (and hardware pointer updates) on every screen change. The
synthetic video device parses each message and forwards it to the VNC worker,
which discards it because there are no clients to send to. For a guest with
active screen content and nobody watching, that is wasted guest CPU, vmbus
round-trips, and host allocations that produce nothing.
This change uses the synthvid
FeatureChangemessage to tell the guest to stopreporting while no client is connected, and to resume on the first connect.
How it works
meshchannel paired with the existing dirty-rect channel, signals the video device
when that count crosses 0 to 1 and back.
FeatureChangethat clearsis_dirt_neededand the two hardware-pointer flags (the device ignores thosepointer reports and the VNC worker is the only dirt consumer).
is_video_situation_updates_neededstays set, so a resolution change whileidle is still tracked and the next client renders at the correct size.
logs a rate-limited warning if a guest keeps sending updates after being told
to stop (it is not honoring the
FeatureChange).Design notes
disables when the worker signals "no client". This keeps an already-connected
client working across a guest reboot or video-driver reload: those re-open the
synthvid channel and reset the device's per-connection state, but the worker
and the client connection persist, so the device must not assume "disabled" on
a fresh handshake.
worker self-corrects: receiving dirt with zero connected clients re-sends "no
client", which settles in one cycle (the device then drops further dirt at the
gate).
directly, and the
FeatureChangefalse-to-true transition makes the guest senda fresh full update, which lines up with the VNC server's forced full refresh
on a client's first frame.
Testing
New unit test
test_feature_change_gates_dirt_on_consumer_presencein the videodevice: after the handshake, a "no client" demand yields a
FeatureChangewithis_dirt_needed = 0; flipping to "client" yieldsis_dirt_needed = 1; dirtreceived while disabled is dropped.
Manual validation on a Linux/KVM host with a Gen2 (UEFI) guest over VNC, watching
the device and worker trace:
No "guest not honoring FeatureChange" warnings were observed in any scenario.
cargo fmt --all -- --check,cargo clippy --workspace --all-targets, andcargo doc --workspace --no-depsare clean; touched-crate tests pass.