Skip to content

fix: proper fd lifecycle management to eliminate fdsan crash#877

Open
tardyp wants to merge 1 commit into
COVESA:masterfrom
tardyp:android_fix
Open

fix: proper fd lifecycle management to eliminate fdsan crash#877
tardyp wants to merge 1 commit into
COVESA:masterfrom
tardyp:android_fix

Conversation

@tardyp

@tardyp tardyp commented May 26, 2026

Copy link
Copy Markdown

fdsan crash is found while integrating dlt-daemon into an android automotive system.
We are not deep experts of the dlt-daemon inner working, but I think that the IA answer is sound.
We could not reproduce the fdsan crash after patch applied.

Root cause: The DLT_CONNECTION_GATEWAY connections share a receiver pointer aliased into DltGatewayConnection.client.receiver. When dlt_connection_destroy() closed the fd, the gateway's client.sock retained the stale fd number. If the kernel reused that fd number (e.g., for a FILE*), subsequent gateway send() calls would trigger Android fdsan abort.

Changes:

  • dlt_connection_destroy: Do NOT close fd for GATEWAY connections (they don't own it). Only detach the receiver pointer.
  • dlt_gateway_close_connection: New function that properly closes client.sock AND invalidates client.receiver.fd at all disconnect points.
  • Deferred destruction: Connections are now marked PENDING_DESTROY instead of being freed immediately. A sweep phase after the event loop iteration safely destroys them, preventing use-after-free when callbacks trigger their own connection's removal.

Root cause: The DLT_CONNECTION_GATEWAY connections share a receiver pointer
aliased into DltGatewayConnection.client.receiver. When dlt_connection_destroy()
closed the fd, the gateway's client.sock retained the stale fd number. If the
kernel reused that fd number (e.g., for a FILE*), subsequent gateway send()
calls would trigger Android fdsan abort.

Changes:
- dlt_connection_destroy: Do NOT close fd for GATEWAY connections (they don't
  own it). Only detach the receiver pointer.
- dlt_gateway_close_connection: New function that properly closes client.sock
  AND invalidates client.receiver.fd at all disconnect points.
- Deferred destruction: Connections are now marked PENDING_DESTROY instead of
  being freed immediately. A sweep phase after the event loop iteration
  safely destroys them, preventing use-after-free when callbacks trigger
  their own connection's removal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Pierre Tardy <pierre.tardy@renault.com>
@minminlittleshrimp

Copy link
Copy Markdown
Collaborator

Hi @tardyp
good catch.
This can be combined with our latest upstream fix for fds bug in daemon. Shortly say, team has a huge bug fix effort for the plling loop including all fds (unix and inet socket, user fifo or socket handling) but this not yet bounded the multinode-gateway case.
We shall take your patch seriously and provide feedback soon if rework needed, else we would prefer we upstream our fix first and rebase yours upon.
Thanks for the contribution, happy coding
Minh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants