Skip to content

vnc: stop guest screen/pointer reporting when no client is connected#3614

Closed
bitranox wants to merge 9 commits into
microsoft:mainfrom
bitranox:vnc-idle-dirt-suppression
Closed

vnc: stop guest screen/pointer reporting when no client is connected#3614
bitranox wants to merge 9 commits into
microsoft:mainfrom
bitranox:vnc-idle-dirt-suppression

Conversation

@bitranox

@bitranox bitranox commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

When no VNC client is connected, the guest still generates and sends synthvid
dirty rectangles (and hardware pointer updates) on every screen change. The
synthetic video device parses each message and forwards it to the VNC worker,
which discards it because there are no clients to send to. For a guest with
active screen content and nobody watching, that is wasted guest CPU, vmbus
round-trips, and host allocations that produce nothing.

This change uses the synthvid FeatureChange message to tell the guest to stop
reporting while no client is connected, and to resume on the first connect.

How it works

  • The VNC worker tracks the connected-client count and, over a new mesh
    channel paired with the existing dirty-rect channel, signals the video device
    when that count crosses 0 to 1 and back.
  • The device relays the signal to the guest with a FeatureChange that clears
    is_dirt_needed and the two hardware-pointer flags (the device ignores those
    pointer reports and the VNC worker is the only dirt consumer).
  • is_video_situation_updates_needed stays set, so a resolution change while
    idle is still tracked and the next client renders at the correct size.
  • As a backstop the device also drops any dirt that arrives while disabled, and
    logs a rate-limited warning if a guest keeps sending updates after being told
    to stop (it is not honoring the FeatureChange).

Design notes

  • The device defaults to the guest's post-handshake enabled state and only
    disables when the worker signals "no client". This keeps an already-connected
    client working across a guest reboot or video-driver reload: those re-open the
    synthvid channel and reset the device's per-connection state, but the worker
    and the client connection persist, so the device must not assume "disabled" on
    a fresh handshake.
  • If the device re-handshakes while idle and returns to the enabled default, the
    worker self-corrects: receiving dirt with zero connected clients re-sends "no
    client", which settles in one cycle (the device then drops further dirt at the
    gate).
  • Disabling reporting never breaks output: any consumer can still read VRAM
    directly, and the FeatureChange false-to-true transition makes the guest send
    a fresh full update, which lines up with the VNC server's forced full refresh
    on a client's first frame.

Testing

New unit test test_feature_change_gates_dirt_on_consumer_presence in the video
device: after the handshake, a "no client" demand yields a FeatureChange with
is_dirt_needed = 0; flipping to "client" yields is_dirt_needed = 1; dirt
received while disabled is dropped.

Manual validation on a Linux/KVM host with a Gen2 (UEFI) guest over VNC, watching
the device and worker trace:

Scenario Result
Boot, no client reporting disabled
Client connects reporting enabled
Client disconnects reporting disabled
Guest reset with a client connected stays enabled, client not frozen
Re-handshake while idle worker self-corrects to disabled

No "guest not honoring FeatureChange" warnings were observed in any scenario.

cargo fmt --all -- --check, cargo clippy --workspace --all-targets, and
cargo doc --workspace --no-deps are clean; touched-crate tests pass.

When no VNC client is connected, the guest still generates and sends
synthvid dirty rectangles and hardware pointer updates on every screen
change. The video device parses each one and forwards it to the VNC
worker, which discards it because there are no clients. For a guest with
active screen content and nobody watching, that is wasted guest CPU,
vmbus traffic, and host allocations.

Use the synthvid FeatureChange message to tell the guest to stop. The VNC
worker signals the video device when the connected-client count crosses
0<->1, over a new mesh channel; the device relays that to the guest by
clearing is_dirt_needed and the two pointer flags. Situation (resolution)
updates stay enabled so a mode change while idle is still tracked. The
device also gates its own forwarding while disabled and warns (rate
limited) if a guest keeps sending updates anyway.

The device defaults to the guest's enabled state and only disables when the
worker signals "no client". This keeps an already-connected client working
across a guest reboot or video-driver reload, which re-open the synthvid
channel and reset the device's per-connection state while the worker and
client persist. If the device re-handshakes while idle and returns to the
enabled default, the worker self-corrects: receiving dirt with zero clients
re-sends "no client", which settles in one cycle.

Wiring spans the openvmm and openhcl launch paths and the petri test
builder. Adds a device unit test for the gate and documents the mechanism
in the VNC architecture guide.
Copilot AI review requested due to automatic review settings June 1, 2026 18:55
@bitranox bitranox requested a review from a team as a code owner June 1, 2026 18:55
@github-actions github-actions Bot added the Guide label Jun 1, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds an “updates needed” reverse channel between the VNC worker and the synthetic video device so the guest can stop generating dirty rectangles and pointer updates while no VNC client is connected.

Changes:

  • Adds updates_needed_send/recv to VNC/video resource plumbing across openvmm and underhill.
  • Implements gating in the video device via synthvid FeatureChange, plus dropping/monitoring updates when disabled.
  • Updates VNC multi-client server to signal first-connect/last-disconnect and to re-assert “disabled” if dirt arrives while idle; adds documentation and a new video test.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
workers/vnc_worker_defs/src/lib.rs Extends VNC worker parameters with updates_needed_send.
workers/vnc_worker/src/lib.rs Signals synthvid “updates needed” based on client presence / idle dirt.
vm/devices/uidevices_resources/src/lib.rs Extends SynthVideoHandle with updates_needed_recv.
vm/devices/uidevices/src/video/mod.rs Implements FeatureChange send + gates dirt/pointer updates; adds test.
vm/devices/uidevices/src/resolver.rs Wires updates_needed_recv into Video::new.
petri/src/vm/openvmm/construct.rs Initializes new handle field in Petri VM config.
openvmm/openvmm_entry/src/lib.rs Creates and wires the reverse channel between VNC worker and device.
openhcl/underhill_core/src/worker.rs Plumbs reverse channel into Underhill synth video handle.
openhcl/underhill_core/src/lib.rs Refactors console setup to include both channel directions.
Guide/src/reference/devices/vnc/what-we-are-proud-of.md Documents idle suppression optimization.
Guide/src/reference/devices/vnc/architecture.md Documents the new reverse channel + behavior details.

Comment thread workers/vnc_worker/src/lib.rs
Comment thread vm/devices/uidevices/src/video/mod.rs
Comment thread workers/vnc_worker/src/lib.rs
@bitranox

bitranox commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@jstarks one thing to flag before this merges: your standalone client will stop receiving dirty frames.

The suppression keys off the VNC worker's own connected-client count. The worker has no knowledge of your standalone client, so as soon as it has no clients of its own, it tells the guest to stop reporting dirty rectangles, and your client gets starved.

If you want to keep it working, I can add an opt-out flag, something like --keep-dirty-rects-enabled, that forces reporting to stay on regardless of client count. That's the quick fix.

The cleaner long-term answer probably depends on how we want to handle that second client. The device already owns the enable/disable decision and can aggregate demand from more than one source, so the standalone client could register its own "needs dirt" signal instead of relying on a global flag.

Which direction do you prefer? And do you actually want to go down this path, or would you rather hold off?

yours sincerely

Robert

Addresses review feedback on the idle-suppression change.

- Add an e2e test in vnc_worker for the coordinator's updates-needed
  signaling: startup signals false, the first client connect signals
  true, the last disconnect signals false, and dirt arriving while idle
  re-asserts false (the self-heal path).
- Add a comment at the video device select! explaining why cancelling the
  in-flight guest packet read is safe (message-mode MessagePipe::recv
  commits the ring read offset only on a complete read, so a dropped
  pending read leaves the stream untouched) and why the loop must wait on
  the packet and the demand channel concurrently.
@bitranox

bitranox commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@jstarks :
I agree with both points You pointed out on my previouse PR. The channel being point-to-point and the unbounded mesh::channel are real, and a shared-memory broadcast ring with a published write offset and "fell behind past the buffer length -> full refresh" solves both at once.

One thing worth stating explicitly, because it makes the lossy ring the right fit: dirty rectangles are only hints. The real pixels are in VRAM, so dropping rects is always safe and recovered by a full refresh, never a correctness problem. That fall-behind path is the same missed_dirty behavior the VNC server already uses, just moved to the cross-process boundary. The vmbus ring looks like the natural in-tree precedent for the shared section and the release/acquire cursor.

The one thing I'd watch in the design: keep the "broadcast the latest offset" notification bounded too (coalesce to the newest offset, or a shared atomic cursor plus a wakeup event), so the unbounded-queue problem doesn't just move from the data to the notifications.

It also composes cleanly with the idle suppression in this PR: the "updates needed" demand generalizes from "the VNC worker has a client" to "any consumer is active" (the device already owns that decision and can OR multiple consumers). Once your standalone GUI is a first-class consumer through the ring, the --keep-dirty-rects-enabled stopgap I floated earlier isn't needed.

I take this on as a follow-up, I've noted it as the top item on our side.

yours sincerely
Robert

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

@bitranox

bitranox commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@smalis-msft - hopefully You can catch the time to review it, then I can implement the changes @jstarks proposed
yours sincerely
Robert

@bitranox

bitranox commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@jstarks - steering needed

Coming back to this after sitting with it. I still think the shared-memory ring is a clean design, and the multi-consumer problem probably needs solving if You really need that standalone GUI, but I want to make sure we build the right amount of machinery. Here is where my thinking landed and where I'd value your steer.

If the multi-consumer need is real: mesh channels are single-consumer (the receiver isn't cloneable and there's no broadcast primitive), so the current dirt_send/dirty_recv can only feed one consumer, and your standalone GUI can't coexist with the VNC server on it.

On the unbounded-memory point I'm less sure it's load-bearing. The coordinator drains the channel continuously and the per-item work is tiny, so in practice it stays near-empty. The idle-suppression change in this PR makes it less likely still, because the guest only produces dirt while a consumer is actually watching and draining. I haven't been able to make it grow. So I'd treat the memory bound as a nice property rather than a driver.

If multi-consumer is the real goal, there's a simpler option than a shared section: have the device hold one mesh sender per consumer and broadcast Arc<Vec> to each. That is the same Arc fan-out the VNC worker already does for its own clients, just lifted cross-process. No shared memory or cross-process atomics, and it keeps the VNC crates' forbid(unsafe_code). Consumers register a sender with the device, and the per-consumer demand aggregation for the idle suppression is the same work either way.

The ring's real advantages (one buffer for many readers, a hard memory bound, cross-process zero-copy) mostly pay off at many consumers or under genuine pressure. At two consumers they're marginal, and the rects are small enough that the copy is free. Against that, the ring adds shared-memory atomics with explicit ordering, torn-read detection, a bounded wakeup path, and a fair amount of concurrency testing, which is real surface to get right.

So my questions:

  • How many consumers do you realistically expect? If it is the VNC server plus the GUI, I would lean toward the simple per-consumer broadcast and keep the ring in our back pocket for when that grows.
  • Is the ring something you want for plans I can't see (more consumers, a different producer, consistency with the vmbus ring), or mainly to unblock the GUI?

It's your call given you own both the codebase and the GUI, but right now I'd lean toward the simple broadcast: the device fans out to each consumer, with the GUI as another consumer alongside the VNC worker rather than behind it.

see #3628 for actual measurements

Robert

Comment thread petri/src/vm/openvmm/construct.rs Outdated
Comment thread workers/vnc_worker_defs/src/lib.rs Outdated
Comment thread workers/vnc_worker/src/lib.rs
bitranox added 3 commits June 4, 2026 04:57
The dirty-rect receiver and the updates-needed sender are wired together or not at all: both present when a synth video device exists, both absent otherwise. They were two separate Option fields, which allowed an invalid "one Some, one None" state. Move them into a single SynthVideoChannels struct carried as one Option through VncParameters, the worker, and the openvmm_entry plumbing, so the mismatch is unrepresentable. The worker test no longer constructs the mismatched state either.
wait_for_signal and wait_for_input polled with a fixed deadline and panicked on timeout, which just adds flakiness. Block on recv.recv() (via pal_async::local::block_on) and let nextest's slow-test timeout fail a genuinely wedged test. A closed channel still panics, since that is a real failure rather than a timeout.
SynthVideoHandle carried dirt_send and updates_needed_recv as two separate Options, which allowed the same mismatched "one Some, one None" state (the petri handle set both to None by hand). Bundle them into a SynthVideoChannels struct behind one Option, unpacking at the resolver into the device's existing fields. Name the worker's match bindings channels to match.
Copilot AI review requested due to automatic review settings June 4, 2026 03:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

openhcl/underhill_core/src/lib.rs:1

  • VncParameters was changed to carry synth_video: Option<SynthVideoChannels> (bundling the dirty receiver + updates-needed sender). This call site still populates the old shape (dirty_recv plus a separate updates_needed_send), which won’t compile. Create synth_video: Option<vnc_worker_defs::SynthVideoChannels> from dirty_rect_recv and updates_needed_send and pass it via the synth_video field.
// Copyright (c) Microsoft Corporation.

Comment thread openhcl/underhill_core/src/worker.rs
Comment thread workers/vnc_worker_defs/src/lib.rs
…e device type distinctly

Two follow-ups to the device-side bundle (Copilot review): underhill_core (the OpenHCL paravisor) constructs both SynthVideoHandle and VncParameters and was missed, so wire both through the bundled channels/synth_video fields, built only when both endpoints are present. And rename the device-side bundle to SynthVideoDeviceChannels so it is distinct from the worker-side SynthVideoChannels at call sites that wire both ends.
@bitranox bitranox requested a review from smalis-msft June 4, 2026 03:48
The bundle refactor moved the dirty-rect receiver and updates-needed sender into synth_video: Option<SynthVideoChannels> (and the device end into SynthVideoHandle.channels). Update the architecture reference, which still described the old un-bundled fields.
Copilot AI review requested due to automatic review settings June 4, 2026 04:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Comment thread vm/devices/uidevices/src/video/mod.rs
Comment thread workers/vnc_worker/src/lib.rs
Comment thread vm/devices/uidevices_resources/src/lib.rs Outdated
bitranox added 2 commits June 4, 2026 06:29
Match the worker's dirty_recv and the DirtyRect the channel carries; the field was dirt_send on the device but dirty_recv on the worker. Renamed through the device resource, resolver, openvmm_entry, and the openhcl paravisor.
The idle self-heal re-asserts false on every dirty event the guest sends while no client is connected (needed because the device defaults to enabled after a re-handshake, so a naive dedupe would break it). A non-compliant guest could spam the debug log, so wrap it in tracelimit::event_ratelimited! at DEBUG level. The send is a cheap bool the device dedupes; this rate-limits the log, not the self-heal.
@bitranox

bitranox commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

merged into #3665

@bitranox bitranox closed this Jun 4, 2026
@bitranox bitranox deleted the vnc-idle-dirt-suppression branch June 9, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants