Skip to content

Test harness: screenshot, scroll control, and element frames over dist#40

Merged
GenericJam merged 2 commits into
masterfrom
worktree-agent-drive-screenshot-scroll
May 29, 2026
Merged

Test harness: screenshot, scroll control, and element frames over dist#40
GenericJam merged 2 commits into
masterfrom
worktree-agent-drive-screenshot-scroll

Conversation

@GenericJam
Copy link
Copy Markdown
Owner

@GenericJam GenericJam commented May 29, 2026

Closes the last hard dependency on adb/xcrun/idb for driving a Mob app: a remotely-connected agent can now see, scroll, and locate elements entirely over Erlang distribution. This is what Sloppy Joe (agent-programmable, no local device tooling) and WireTap need.

What's added (Mob.Test)

  • screenshot/2 → PNG/JPEG bytes over dist (in-process capture, no xcrun/adb).
  • scroll_info/2, scroll_to/4, screenshot_tour/3 — read a scroll view's offset/extent by :id and drive it to an absolute offset / :top / :bottom / {:page, n}; tour pages a long screen capturing each.
  • element_frames/1 (%{id => {x, y, w, h}}), frame/2, tap_id/2 — element positions without a screenshot: any node given an :id reports its live frame to a registry the agent reads as a compact structured map (no image bytes, no AX activation). The :id also becomes the accessibilityIdentifier (iOS) / Compose testTag (Android).

Backed by NIFs screenshot/3, scroll_info/1, scroll_to/3, element_frames/0 (debug-only; iOS under #if !MOB_RELEASE). Target resolution and tour paging are pure, unit-tested Elixir helpers.

Platform notes

  • iOS: UIGraphicsImageRenderer + drawViewHierarchy; UIScrollView.contentOffset; a GeometryReader-backed MobFrameTracker + C frame registry. SwiftUI doesn't reliably propagate accessibilityIdentifier to the backing UIScrollView, so scroll lookup falls back to the largest scroll view.
  • Android: PixelCopy; id-keyed Compose scroll + frame registries via onGloballyPositioned. scroll_info kind is :pixel (verticalScroll) or :index (LazyColumn). Capture is opt-in per :id, so untagged nodes cost nothing.

Companion PR (must land together)

The Android Kotlin side lives in the mob_new MobBridge.kt.eex template: GenericJam/mob_new#19

Verification

Built and verified end-to-end on iOS simulator, a physical iPhone, and an Android device (moto g power): screenshot bytes + deterministic scroll + 41 element frames over dist, no adb/xcrun. Static gates green: Elixir unit tests, erlfmt, clang-format, credo, zig ast-check.

Scope: this is core test-harness work (same bucket as ui_tree/tap_xy), not a plugin-shaped feature, so it's clear to land under the plugin-first hold.

🤖 Generated with Claude Code

GenericJam and others added 2 commits May 29, 2026 14:06
Add screenshot/3, scroll_info/1, scroll_to/3 NIFs (debug-only, iOS
#if !MOB_RELEASE) surfaced as Mob.Test.screenshot/2, scroll_info/2,
scroll_to/4, screenshot_tour/3. A remotely-connected agent gets pixels
and deterministic scroll entirely over Erlang distribution — no
adb/xcrun/idb — which is what Sloppy Joe and WireTap need.

- iOS: UIGraphicsImageRenderer + drawViewHierarchy for capture;
  UIScrollView.contentOffset for scroll. Scroll views are tagged with
  the node :id as accessibilityIdentifier; since SwiftUI doesn't reliably
  propagate that onto the backing UIScrollView, the NIF falls back to the
  largest scroll view.
- Android: PixelCopy against the activity window; an id-keyed Compose
  scroll registry (ScrollState/LazyListState) in MobBridge. kind is
  :pixel (UIScrollView / verticalScroll) or :index (LazyColumn).
- Target resolution (:top/:bottom/{:page,n}/{x,y}) and tour paging are
  pure, unit-tested Elixir helpers.

Verified end-to-end on iOS sim, Android device, and a physical iPhone.
The Android Kotlin side lives in the mob_new MobBridge.kt.eex template.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add element_frames/0 NIF surfaced as Mob.Test.element_frames/1
(%{id => {x,y,w,h}}), frame/2, and tap_id/2. Any rendered node given an
:id reports its live on-screen frame (logical points iOS / dp Android)
to a registry the agent reads over dist — a compact structured map
instead of image bytes, with no accessibility activation. Lets an agent
locate and drive elements by id without screenshotting (which blows out
session memory).

- iOS: a MobFrameTracker ViewModifier on every node records frame(in:
  .global) via a GeometryReader background and sets accessibilityIdentifier
  when the node carries an :id; frames go to a C registry (mob_register_frame
  / g_element_frames in mob_nif.m), cleared on set_root. nif_element_frames
  is debug-only; the registry itself uses only public APIs.
- Android: RenderNodeInner attaches Modifier.onGloballyPositioned + testTag
  for id'd nodes → elementFramesById in MobBridge (px→dp). (Kotlin side in
  the mob_new MobBridge.kt.eex template.)

Opt-in per :id — untagged nodes get no tracking modifier, so zero cost.
Verified on iOS sim, Android device, and a physical iPhone (41 frames each).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@GenericJam GenericJam merged commit 452dc4c into master May 29, 2026
4 checks passed
@GenericJam GenericJam deleted the worktree-agent-drive-screenshot-scroll branch May 29, 2026 22:45
GenericJam added a commit that referenced this pull request May 29, 2026
Cuts a release for the screenshot / scroll / element-frame test harness
(#40) — in-process screenshot, deterministic id-addressed scroll, and
screenshot-free element frames over Erlang dist (no adb/xcrun), the
capability Sloppy Joe and WireTap need. Also ships the Mob.Bt →
mob_bluetooth plugin extraction already on master (breaking: no compat
shim; apps add {:mob_bluetooth, ...} and rename to MobBluetooth.*).

Companion release: mob_new 0.3.15 (the Kotlin bridge side).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant