Skip to content

Add nice_agent_clear_stream_local_addresses and fix use-after-free of selected_pair.local during re-gather#18

Draft
Copilot wants to merge 2 commits into
masterfrom
copilot/add-clear-stream-local-addresses-function
Draft

Add nice_agent_clear_stream_local_addresses and fix use-after-free of selected_pair.local during re-gather#18
Copilot wants to merge 2 commits into
masterfrom
copilot/add-clear-stream-local-addresses-function

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

Exposes a way to drop the configured local addresses on a stream so callers can replace them, complementing the existing nice_agent_add_stream_local_address.

  • agent/agent.h: declare nice_agent_clear_stream_local_addresses(NiceAgent *, guint stream_id).
  • agent/agent.c: implementation takes the agent lock, looks up the stream, frees each NiceAddress in stream->local_addresses, frees the list, and nulls it out — mirroring the cleanup already performed in stream_free(). Returns FALSE for an invalid stream_id.
  • nice/libnice.sym: export the new symbol.
nice_agent_add_stream_local_address (agent, stream_id, &addr_old);
/* ...later, replace the set... */
nice_agent_clear_stream_local_addresses (agent, stream_id);
nice_agent_add_stream_local_address (agent, stream_id, &addr_new);

No tests added: the existing suite has no coverage of the per-stream local-address API (only the agent-wide nice_agent_add_local_address is used in tests/), so integration coverage is expected to live in the consuming layer.

Fix use-after-free of component->selected_pair.local during re-gather

When the local interface set changes and nice_agent_gather_candidates() is called again (e.g. during an ICE restart/reconfigure triggered by a network change), libnice prunes local candidates that live on interfaces that no longer exist and frees them. The prune path never checked whether the freed candidate was the one referenced by component->selected_pair.local, leaving that pointer dangling. The next nice_agent_send() would read selected_pair.local->sockptr out of freed memory and call nice_socket_send() with a garbage NiceSocket *, crashing with SIGSEGV.

  • agent/component.c / agent/component.h: add component_invalidate_selected_pair_if_local(). If the component's selected local candidate matches the one being freed, it destroys/unrefs selected_pair.keepalive.tick_source (avoiding a separate timer-based leak/UAF) and memsets selected_pair to zero — the canonical teardown already used by component_update_selected_pair() and component_free().
  • agent/agent.c: call the helper from the prune loop in nice_agent_gather_candidates(), immediately before nice_candidate_free(candidate). With selected_pair.local == NULL, nice_agent_send() already no-ops, so sends are safely dropped until ICE re-selects a pair.

No in-tree regression test was added: the meson test harness only wires standalone tests with no libnice-internal dependency, and faithfully reproducing this UAF requires a full two-agent negotiation with a controllable interface set, so end-to-end coverage is expected to live in the consuming layer (rtp_ice_restart_and_reconfigure_changing_local_addresses). agent.c and component.c were verified to compile cleanly.

@bitflows
Copy link
Copy Markdown
Contributor

@copilot does libnice allow adding additional local_addresses while its running, and will a new entry be picked up immediately, and new clients gathered?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

@copilot does libnice allow adding additional local_addresses while its running, and will a new entry be picked up immediately, and new clients gathered?

You can add them at any time, but they are not picked up automatically — they only take effect on the next nice_agent_gather_candidates() call.

Looking at agent/agent.c:

  • nice_agent_add_local_address() / nice_agent_add_stream_local_address() simply take the agent lock and append the NiceAddress to agent->local_addresses or stream->local_addresses. No signal is emitted, no socket is created, and no host candidate is generated at that point.
  • The list is only consulted inside nice_agent_gather_candidates() (around line 1551), which snapshots stream->local_addresses (falling back to agent->local_addresses, then to nice_interfaces_get_local_ips()) once at the top, and creates UDP/TCP host candidates from that snapshot.

So in practice:

  1. Adding an address mid-run is safe (it's under the agent lock).
  2. No new host candidates are produced and nothing is signalled to the peer until you re-gather.
  3. To pick up the new address you need to call nice_agent_gather_candidates(agent, stream_id) again (or restart the stream). Note the TODO at ~line 1581 — stale candidates from removed interfaces aren't pruned, so re-gathering only adds candidates on existing streams; an ICE restart is cleaner if you need a fresh set.
  4. An address added after gathering has started but before it finishes won't be included in that run either, because of that one-shot snapshot.

The typical pattern with the new nice_agent_clear_stream_local_addresses from this PR is:

nice_agent_clear_stream_local_addresses(agent, stream_id);
nice_agent_add_stream_local_address(agent, stream_id, &new_addr);
nice_agent_gather_candidates(agent, stream_id);

If you want "added → immediately gathered" without an explicit re-gather call, that behaviour doesn't exist in libnice today and would need a new code path.

@bitflows
Copy link
Copy Markdown
Contributor

@copilot In pulse_network_monitor.c, I've spotted a bug. We get events from glib_network_monitor, and act on them. But, we seems to get an IPv4 address first, then if there is a IPv6 dhcp, it will take a few rounds before it finally has the "final" IPv6 address.
Based on these events, we do different things to the system, like send an SDP update or reset curl. Ideally, we do not want to force such things until we have the "final" ipv4 and ipv6 addresses ready, so we don't fire multiple times. The logic is in pulse_network_monitor_handle_connectivity_loss_unlocked().
Can you add some extra logic there, so we register the address change, but always wait at least 1000ms (configurable value), to wait for further changes (resets on every new address change), before calling any of the resetters (curl, event subscribers and sdp update).

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

@copilot In pulse_network_monitor.c, I've spotted a bug. We get events from glib_network_monitor, and act on them. But, we seems to get an IPv4 address first, then if there is a IPv6 d...

That file isn't part of libnice — pulse_network_monitor.c / pulse_network_monitor_handle_connectivity_loss_unlocked() don't exist in this repository (I searched the tree and there are no matches). It sounds like it lives in a downstream consumer of libnice, so the change needs to be made there, not in this PR.

That said, here's the shape of what you're describing in case it's useful when you raise it in the right repo:

  • Add a guint debounce_timeout_ms field (default 1000) and a guint pending_source_id to the monitor struct, configurable via a setter or property.
  • In pulse_network_monitor_handle_connectivity_loss_unlocked(), instead of invoking the resetters (curl reset, event subscribers, SDP update) directly:
    1. Record/merge the new address state into a pending struct.
    2. If pending_source_id != 0, g_source_remove() it (resets the timer on every new event).
    3. pending_source_id = g_timeout_add(debounce_timeout_ms, on_debounce_elapsed, self);
  • In on_debounce_elapsed(): take the same lock, clear pending_source_id, then call the resetters once with the latest snapshot. Return G_SOURCE_REMOVE.
  • On monitor dispose: if pending_source_id != 0, g_source_remove() it to avoid firing after free.

If you can point me at the correct repo/PR I'm happy to do the implementation there.

Agent-Logs-Url: https://github.com/pexip/libnice/sessions/285af649-49e4-46aa-b4da-604a0fdfd6bf

Co-authored-by: bitflows <19825042+bitflows@users.noreply.github.com>
@bitflows bitflows force-pushed the copilot/add-clear-stream-local-addresses-function branch from 703dcd1 to e228324 Compare May 29, 2026 07:01
@bitflows
Copy link
Copy Markdown
Contributor

bitflows commented Jun 1, 2026

@copilot Can you see what it would take to implement a cancel stream function, that stops the waiting for any STUN/TURN servers? We need to be able to perform either ice restart (possibly with a new candidate gather and a reconfigure) or a full stream remove once completed. The idea here is to speed up ICE restart, so in case a libnice gathering or negotiation in ongoing, and we need to reconfigure the parameters due to an additional TURN server, or change the local_addresses, due to a link change.

Separately from thjis, could we possibly add a new TURN server WHILE the gathering/negotiation is in progress?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jun 1, 2026

@copilot Can you see what it would take to implement a cancel stream function, that stops the waiting for any STUN/TURN servers? We need to be able to perform either ice restart (possi...

Good news: most of the machinery for a "cancel stream" already exists internally — it's mainly a matter of exposing it and wiring it together.

1. Cancel/stop waiting for STUN/TURN on a stream

Gathering is driven by discovery.c: a single agent->discovery_timer_source ticks priv_discovery_tick_unlocked(), retransmitting STUN/TURN requests for each CandidateDiscovery in agent->discovery_list until not_done == 0. The pieces to tear that down already exist:

  • discovery_prune_stream(agent, stream_id) — removes all discovery items for the stream and frees the timer once the list is empty. This is exactly "stop waiting for STUN/TURN". It's already used by nice_agent_remove_stream() (agent.c:1901) and the re-gather path (agent.c:1863).
  • refresh_prune_stream(agent, stream_id) — cancels outstanding TURN refreshes.
  • conn_check_prune_stream(agent, stream) — stops in-progress connectivity checks/negotiation (agent.c:1900, 3261, 3315).

So a public nice_agent_cancel_stream(agent, stream_id) would essentially be: take the agent lock, look up the stream, call those three prune functions, set stream->gathering = FALSE, and decide whether to emit gathering-done (probably not, to distinguish "cancelled" from "finished"). That gives you a fast bail-out you can call before either an ICE restart or a nice_agent_remove_stream(). The estimate here is small — a thin public wrapper plus a symbol export; the risky part is just choosing the right signalling/state so callers aren't surprised by a half-gathered stream.

For the ICE-restart speed-up you describe: today nice_agent_restart_stream() only regenerates ufrag/pwd and restarts conncheck — it does not stop an in-flight gather. So the useful pattern would be cancel_stream → reconfigure (relays / nice_agent_clear_stream_local_addresses + add_stream_local_address) → nice_agent_gather_candidates() again. Note re-gather currently only adds candidates (the TODO near agent.c:1581 about not pruning stale ones), so a full restart is still cleaner if you need a completely fresh set.

2. Adding a TURN server while gathering/negotiation is in progress

nice_agent_set_relay_info() (agent.c:1465) just appends a TurnServer to component->turn_servers under the agent lock — that part is safe to call mid-run. But the TURN discovery items are built from a one-shot snapshot of component->turn_servers inside nice_agent_gather_candidates() (the loop at agent.c:1690 calling priv_add_new_candidate_discovery_turn()). Nothing watches that list afterwards, so a server added mid-gather is not picked up until the next gather — same one-shot-snapshot limitation as the local-addresses case from this PR.

Two ways to get "added → gathered immediately":

  • Cheap, no core change: cancel_stream + set_relay_info + re-gather (the workflow above).
  • New code path: on set_relay_info, if stream->gathering is true, create the CandidateDiscovery item(s) for the new relay there and then and call discovery_schedule(agent) to (re)start the timer. That's more invasive — you'd need to guard against duplicating discoveries already created for that component and handle the case where gathering had already finished.

If you'd like, point me at where the cancel function should live (here in libnice, given the prune helpers are all internal) and I can implement nice_agent_cancel_stream() as a follow-up.

@bitflows
Copy link
Copy Markdown
Contributor

bitflows commented Jun 1, 2026

@copilot
Title
Fix use-after-free of component->selected_pair.local when a selected local candidate is pruned during re-gather

Summary
When the local interface set changes and nice_agent_gather_candidates() is called again (e.g. during an ICE restart/reconfigure triggered by a network change), libnice prunes local candidates that live on interfaces that no longer exist and nice_candidate_free()s them. The prune path never checks whether the candidate being freed is the one currently referenced by component->selected_pair.local. If it is, selected_pair.local is left dangling.

The next nice_agent_send() (agent/agent.c) reads component->selected_pair.local->sockptr out of that freed candidate and calls nice_socket_send() with a garbage NiceSocket *, crashing with SIGSEGV:

Code
nice_socket_send (sock=0x6f6c670a0c0a0455, to=..., len=75, buf=...)
(The bogus sock value is freed heap memory reused as string data — a classic UAF signature.)

Trigger / reproduction
An established call with a selected pair, where the selected local candidate sits on an interface that subsequently disappears (interface/IP change mid-call).
A re-gather (nice_agent_gather_candidates) runs the prune loop and frees that local candidate.
Media keeps flowing, so nice_agent_send() is called in the window before connectivity checks re-select a new pair → dereferences the dangling selected_pair.local.
Root cause (exact location)
In agent/agent.c, nice_agent_gather_candidates(), the block commented "Prune any local candidates created on interfaces that no longer exists":

C
conn_check_prune_local_candidate(agent, stream, candidate);
discovery_prune_local_candidate (agent, candidate);
refresh_prune_local_candidate (agent, candidate);
component->local_candidates = g_slist_remove (component->local_candidates, candidate);
nice_candidate_free (candidate); // selected_pair.local may still point here
Neither this loop, nor conn_check_prune_local_candidate() / discovery_prune_local_candidate() / refresh_prune_local_candidate(), nor component_restart(), invalidates component->selected_pair when the pruned candidate is the selected local. All of this runs under the agent lock, and nice_agent_send() also takes the agent lock — so this is not a concurrency race; it is a stale-pointer bug in agent state that outlives the lock.

Required change
Before freeing a local candidate in the prune path, detect whether it is the current selected local and, if so, reset the component's selected pair so selected_pair.local is never read after free:

If candidate == component->selected_pair.local, tear down the selected pair the same way the existing teardown code does: destroy/unref component->selected_pair.keepalive.tick_source if set, then memset(&component->selected_pair, 0, sizeof(CandidatePair)) (see component_update_selected_pair() and component_free() in agent/component.c for the canonical teardown).
With selected_pair.local == NULL, nice_agent_send() already no-ops (it guards on component->selected_pair.local != NULL), so sends are safely dropped until ICE re-selects a pair — which is the correct behaviour during a restart/re-gather anyway.
Prefer centralising this in a small helper (e.g. invalidate-selected-pair-if-references-candidate) and calling it from the prune loop, so any other path that frees a local candidate can reuse it. Make sure the keepalive timer source is always cleaned up to avoid a separate leak/UAF via the timer.
Tests to add
A regression test that: establishes a selected pair, removes/replaces the local address that backs the selected local candidate, calls nice_agent_gather_candidates() to drive the prune, then calls nice_agent_send() and asserts no crash and that the send is dropped (returns < len / -1) until a new pair is selected. Run under ASan to confirm the UAF is gone.
Confirm existing gather/restart tests still pass.
Notes for whoever picks it up
This is the real fix; the media-side change in pexip/media (parking nicesink senders during reconfigure/restart in papanice.c) cannot fix it, because the corruption is a dangling pointer in libnice agent state that persists after the senders are unblocked and before checks re-select a pair.
After landing, bump the external/libnice submodule pin in pexip/media (currently e228324) and, ideally, keep the media-side regression test (rtp_ice_restart_and_reconfigure_changing_local_addresses) as end-to-end coverage.

Copilot AI changed the title Add nice_agent_clear_stream_local_addresses Add nice_agent_clear_stream_local_addresses and fix use-after-free of selected_pair.local during re-gather Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants