Skip to content

Mob.DNS: Android in-process IPv4 resolution via Bionic getaddrinfo NIF#36

Merged
GenericJam merged 1 commit into
masterfrom
dns-android-getaddrinfo
May 28, 2026
Merged

Mob.DNS: Android in-process IPv4 resolution via Bionic getaddrinfo NIF#36
GenericJam merged 1 commit into
masterfrom
dns-android-getaddrinfo

Conversation

@GenericJam
Copy link
Copy Markdown
Owner

What

Adds an Android nif_resolve_ipv4 so Mob.DNS.resolve/1 and preresolve/1 work
on Android the same way they do on iOS (PR #32). The Elixir surface is unchanged
— no platform branching in caller code.

Why

:inet.getaddr/2 returns :nxdomain on physical Android devices we deployed
to, even though the same app's in-process HTTPS stack resolves those hostnames
fine. Verified on a Moto G Power 5G 2024 (Android 14); the Android emulator
doesn't hit this (its DNS proxy at 10.0.2.3 is reachable to any resolver),
which is why it didn't surface earlier.

BEAM's default DNS path forks inet_gethost (a port program) and reads what
its getaddrinfo returns. Suspected cause: libnetd_client.so's routing into
netd doesn't survive the execve into a port program — the in-process call
goes through netd binders that pick up the per-network DNS, the execve'd child
sees an empty resolver. I haven't pinned the exact mechanism; happy to take a
follow-up with the actual diagnosis. Either way the NIF sidesteps it by
running getaddrinfo in the app's own process.

What changed

  • android/jni/mob_zig.zig — Bionic getaddrinfo / freeaddrinfo /
    addrinfo / sockaddr_in / EAI_NONAME / EAI_NODATA / EAI_AGAIN /
    AF_INET / SOCK_STREAM bindings. Layout mirrors AOSP
    bionic/libc/include/netdb.h exactly — BSD-derived ai_canonname before
    ai_addr in struct addrinfo.
  • android/jni/mob_nif.zignif_resolve_ipv4 mirroring iOS's in
    ios/mob_nif.m (same return atoms :nxdomain / :timeout / :no_address /
    {:gai, code}, same ERL_NIF_DIRTY_JOB_IO_BOUND scheduling). Registered in
    nif_funcs[].
  • lib/mob/dns.ex — moduledoc updated: dropped the "Android isn't
    affected" claim; added a background-app caveat (Android App Standby blocks
    all outbound network from a backgrounded mob app — TCP-by-IP, not just
    DNS — surfaces as :closed/:timeout on any socket; fix is a foreground
    service or keep the app foregrounded).
  • common_fixes.md — new section documenting the symptom + fix so the
    next person who sees :nxdomain on a physical Android finds the workaround.

Verified

End-to-end on the Moto G via RPC (BEAM PID matches deployed APK; `Mob.DNS` is
the only DNS path used):

```elixir
Mob.DNS.resolve("repo.hex.pm") #=> {:ok, {151, 101, 21, 91}}
Mob.DNS.resolve("google.com") #=> {:ok, {173, 194, 43, 113}}
Mob.DNS.resolve("nonexistent.invalid") #=> {:error, :nxdomain}
:inet.getaddr(c"repo.hex.pm", :inet) #=> {:ok, {151, 101, 21, 91}} (via seeded :file)
Mix.install([{:short_uuid, "
> 0.1"}]) #=> resolves, fetches, compiles on-device
ShortUUID.encode(uuid) #=> "MCoJqPMVQiCPfeahTVNigE"
```

Quality

  • mix test: 804 + 27 doctests pass, 0 failures.
  • mix credo --strict on lib/mob/dns.ex: clean.
  • mix format --check-formatted: clean.
  • zig fmt --check on mob_nif.zig + mob_zig.zig: clean.

Notes for the reviewer

  • Same atom vocabulary + dirty-IO scheduling as the iOS NIF in PR DNS: document preresolve as the robust iOS path (cellular-safe) #32, so
    callers see identical error shapes across platforms.
  • I didn't add an Android-specific Elixir test — test/mob/dns_test.exs
    already covers the public surface and handles the :nif_not_loaded
    off-device case. On-device NIF behavior isn't exercised in the host test
    suite (same as iOS). If we want device-tier tests later that's a separate
    follow-up.
  • Compile warnings about before_closing_body_tag/1 in mix.exs are
    pre-existing and unrelated to this patch.

🤖 Generated with Claude Code

… NIF

Physical Android devices return :nxdomain from BEAM's default DNS path
(forking inet_gethost as a port program), even though the app's own
in-process HTTPS stack resolves the same hostnames fine. The emulator
doesn't hit this — which is why it wasn't caught until we deployed to a
Moto G Power 5G 2024 on Android 14. Suspected cause: libnetd_client's
routing into netd doesn't survive the execve into the port program; we
haven't pinned the exact mechanism.

Same fix shape as the iOS NIF added in #32: call getaddrinfo from a NIF
running in the app's own process (so it follows the same DNS path JVM
HTTP and the BEAM-level HTTPS use), then seed :inet_db's :file table so
subsequent :inet.getaddr/2 hits find the result. Mob.DNS.resolve and
preresolve now work transparently on Android — no Elixir-side
platform branching.

Verified end-to-end on the Moto G:
  Mob.DNS.resolve("repo.hex.pm")   #=> {:ok, {151, 101, 21, 91}}
  Mob.DNS.resolve("google.com")    #=> {:ok, {173, 194, 43, 113}}
  Mob.DNS.resolve("nonexistent.invalid") #=> {:error, :nxdomain}
  :inet.getaddr(~c"repo.hex.pm")     #=> {:ok, {151, 101, 21, 91}}  (via seeded :file)
  Mix.install([{:short_uuid, "~> 0.1"}]) #=> works; fetches + compiles

Native-side changes:
  android/jni/mob_zig.zig — Bionic getaddrinfo / freeaddrinfo / addrinfo /
    sockaddr_in / EAI_NONAME / EAI_NODATA / EAI_AGAIN / AF_INET /
    SOCK_STREAM bindings. Layout mirrors AOSP bionic/libc/include/netdb.h
    (BSD-derived: ai_canonname *before* ai_addr in struct addrinfo).
  android/jni/mob_nif.zig — nif_resolve_ipv4 mirroring iOS's
    nif_resolve_ipv4 in ios/mob_nif.m: same return atoms (:nxdomain /
    :timeout / :no_address / {:gai, code}), same dirty-IO scheduling.
  lib/mob/dns.ex — moduledoc updated: dropped "Android isn't affected"
    claim, added the foregrounded-app caveat (App Standby blocks all
    outbound network on a backgrounded mob app, not just DNS).
  common_fixes.md — new section documenting symptom, root cause, fix.

Tests: 804 + 27 doctests pass, mix credo --strict on Mob.DNS clean,
zig fmt clean, mix format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@GenericJam GenericJam merged commit 43d1906 into master May 28, 2026
4 checks passed
GenericJam added a commit that referenced this pull request May 28, 2026
Bumps version + CHANGELOG entry for the Mob.DNS Android NIF added in #36.
`Mob.DNS.resolve/1` and `preresolve/1` now work on physical Android the
same way they do on iOS — symmetric NIF, identical Elixir surface, no
caller-side platform branching.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
clsource pushed a commit to NinjasCL-labs/mob that referenced this pull request May 29, 2026
…_load/0 can't)

`:public_key.cacerts_load/0` probes a handful of distro paths for a
system CA bundle — none of which exist on Android. The system trust
store lives behind a Java API that BEAM's `:public_key` doesn't reach,
so the next `:public_key.cacerts_get/0` call raises `no_cacerts_found`.
In some OTP versions `pubkey_os_cacerts.conv_error_reason/1` doesn't
have a clause for that error, so the surface crash is the worse
`FunctionClauseError` on `conv_error_reason/1`.

Hex itself bakes its own DER bundle into `Hex.HTTP.SSL`, so it isn't
affected — but every other Elixir HTTP library (Req → Mint → :ssl,
Finch, anything using OTP-26+ default `:ssl` opts) breaks on the first
TLS connect. Same shape as the DNS issue in GenericJam#36: the OS exposes
something Erlang can't reach, and the workaround is to point Erlang at
an app-provided alternative.

Adds:
- `Mob.Certs.load_cacerts/1` — thin, predictable wrapper around
  `:public_key.cacerts_load/1` (returns `{:error, reason}` rather than
  the `FunctionClauseError` you sometimes see from OTP).
- `Mob.Certs.load_cacerts!/1` — raising variant for boot use.
- `Mob.Certs.loaded?/0` — diagnostic helper that wraps the raising
  `cacerts_get/0` and returns a boolean.

`extra_applications: [:logger, :public_key]` so Elixir 1.19+'s
unused-app culling doesn't strip `:public_key.beam` from the code
path. Documented at length in the moduledoc + `common_fixes.md`.

Usage:

    def on_start do
      Mob.Certs.load_cacerts!(Application.app_dir(:my_app, "priv/cacerts.pem"))
      # …rest of startup…
    end

The bundle is the app's choice (security: who do you trust). `castore`
ships a current Mozilla trust store and is the conventional source —
copy its `cacerts.pem` into your `priv/` at build time.

iOS isn't affected — Darwin exposes the trust store at the paths
Erlang knows about. macOS keychain auto-loads from `:public_key`'s
`cacerts_get/0` too. Cross-platform apps can call `load_cacerts!/1`
unconditionally — a no-op on platforms that already have OS certs.

End-to-end verified on a Moto G Power 5G 2024 (Android 14):
- `Mob.Certs.load_cacerts!("…/priv/cacerts.pem")` succeeds
- `Mob.Certs.loaded?()` returns true
- `:public_key.cacerts_get()` returns 121 (castore's bundle size)
- `Mix.install([{:jason, …}, {:kino, …}])` resolves and compiles
- `:httpc.request(:get, "https://geocoding-api.open-meteo.com/v1/search?…", [ssl: [verify: :verify_peer, cacerts: :public_key.cacerts_get()]], …)` → status 200

Tests: 811 + 27 doctests pass (7 new Certs tests), mix credo --strict
clean, mix format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant