---

agent-device

CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.

The project is in early development and considered experimental. Pull requests are welcome!

Features

- Platforms: iOS (simulator + limited device support) and Android (emulator + device).
- Core commands: open, back, home, app-switcher, press, long-press, focus, type, fill, scroll, scrollintoview, wait, alert, screenshot, close.
- Inspection commands: snapshot (accessibility tree).
- Device tooling: adb (Android), simctl/devicectl (iOS via Xcode).
- Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).

Install

``bash npm install -g agent-device`

Or use it without installing:

`bash npx agent-device open SampleApp`

`Quick Start`

Use refs for agent-driven exploration and normal automation flows.

`bash agent-device open Contacts --platform ios # creates session on iOS Simulator agent-device snapshot agent-device click @e5 agent-device fill @e6 "John" agent-device fill @e7 "Doe" agent-device click @e3 agent-device close`

`CLI Usage`

`bash agent-device [args] [--json]`

Basic flow:

`bash agent-device open SampleApp agent-device snapshot agent-device click @e7 agent-device fill @e8 "hello" agent-device close SampleApp`

Debug flow:

`bash agent-device trace start agent-device snapshot -s "Sample App" agent-device find label "Wi-Fi" click agent-device trace stop ./trace.log`

Coordinates: - All coordinate-based commands (press, long-press, focus, fill) use device coordinates with origin at top-left. - X increases to the right, Y increases downward.

`Command Index`

open, close, home, back, app-switcher

snapshot, find, get

click, focus, type, fill, press, long-press, scroll, scrollintoview, is

alert, wait, screenshot

trace start, trace stop

settings wifi|airplane|location on|off

appstate, apps, devices, session list


Backends (iOS snapshots)

| Backend | Speed | Accuracy | Requirements | | --- | --- | --- | --- | |xctest| Fast | High | No Accessibility permission required | |ax | Fast | Medium | Accessibility permission for the terminal app, not recommended |

Notes: - Default backend isxcteston iOS. - Scope snapshots with-s "" or -s @ref. - If XCTest returns 0 nodes (e.g., foreground app changed), agent-device falls back to AX when available.

Flags: ---platform ios|android---device ---udid (iOS) ---serial (Android) ---activity (Android; package/Activity or package/.Activity) ---session ---verbosefor daemon and runner logs ---jsonfor structured output ---backend ax|xctest (snapshot only; defaults to xctest on iOS)

`Skills`


Install the automation skills listed in SKILL.md.

`bash npx skills add https://github.com/callstackincubator/agent-device --skill agent-device`

Sessions: -openstarts a session. Without args boots/activates the target device/simulator without launching an app. - All interaction commands require an open session. - If a session is already open,open switches the active app and updates the session app bundle. -closestops the session and releases device resources. Pass an app to close it explicitly, or omit to just close the session. - Use--session to manage multiple sessions. - Session scripts are written to~/.agent-device/sessions/-.ad when recording is enabled with --save-script. - Deterministic replay is.ad-based; use replay --update (-u) to update selector drift and rewrite the replay file in place.

Find (semantic): -find [value]finds by any text (label/value/identifier) using a scoped snapshot. -find text|label|value|role|id [value]for specific locators. - Actions:click (default), fill, type, focus, get text, get attrs, wait [timeout], exists.

Assertions: -is predicates: visible, hidden, exists, editable, selected, text. -is text uses exact equality.

Replay update: -replay runs deterministic replay from .adscripts. -replay -u attempts selector updates on failures and atomically rewrites the same file. - Refs are the default/core mechanism for interactive agent flows. - Update targets:click, fill, get, is, wait. - Selector matching is a replay-update internal: replay parses.ad lines into actions, tries them, snapshots on failure, resolves a better selector, then rewrites that failing line.

Update examples:

`sh

`Before (stale selector)`


click "id=\"old_continue\" || label=\"Continue\""
After replay -u (rewritten in place)

click "id=\"auth_continue\" || label=\"Continue\""

`sh

`Before (ref-based action from discovery)`


snapshot -i -c -s "Continue"
click @e13 "Continue"
After replay -u (upgraded to selector-based action)

snapshot -i -c -s "Continue"
click "id=\"auth_continue\" || label=\"Continue\""

Android fill reliability: -fillclears the current value, then enters text. -typeenters text into the focused field without clearing. -fillnow verifies the entered value on Android. - If value does not match, agent-device clears the field and retries once with slower typing. - This reduces IME-related character swaps on long strings (e.g. emails and IDs).

Settings helpers (simulators): -settings wifi on|off-settings airplane on|off-settings location on|off(iOS uses per-app permission for the current session app) Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.

App state: -appstateshows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it falls back to a snapshot-based guess (AX first, XCTest if AX can’t identify). -apps --metadata returns app list with minimal metadata.

`Debug`

- agent-device trace start-agent-device trace stop ./trace.log- The trace log includes snapshot logs and XCTest runner logs for the session. - Built-in retries cover transient runner connection failures, AX snapshot hiccups, and Android UI dumps. - For snapshot issues (missing elements), compare with--raw flag for unaltered output and scope with -s "".

`App resolution`


- Bundle/package identifiers are accepted directly (e.g.,

com.apple.Preferences

).
- Human-readable names are resolved when possible (e.g.,

Settings

).
- Built-in aliases include

Settings

 for both platforms.
iOS notes

- Input commands (

press, type, scroll

, etc.) are supported only on simulators in v1 and use the XCTest runner.
-

alert and scrollintoview

 use the XCTest runner and are simulator-only in v1.
- Real device support (including snapshots) is on the roadmap for iOS.
Testing

`bash pnpm test`

`Build`

`bash pnpm build`

Environment selectors: -ANDROID_DEVICE=Pixel_9_Pro_XL or ANDROID_SERIAL=emulator-5554-IOS_DEVICE="iPhone 17 Pro" or IOS_UDID=

Test screenshots are written to: -test/screenshots/android-settings.png-test/screenshots/ios-settings.png

`Contributing`

See

CONTRIBUTING.md`.

Made at Callstack

agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.

---

agent-device

CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.

The project is in early development and considered experimental. Pull requests are welcome!

Features

Install

``bash npm install -g agent-device`

Or use it without installing:

`bash npx agent-device open SampleApp`

`Quick Start`

Use refs for agent-driven exploration and normal automation flows.

`CLI Usage`

`bash agent-device [args] [--json]`

Basic flow:

`bash agent-device open SampleApp agent-device snapshot agent-device click @e7 agent-device fill @e8 "hello" agent-device close SampleApp`

Debug flow:

`bash agent-device trace start agent-device snapshot -s "Sample App" agent-device find label "Wi-Fi" click agent-device trace stop ./trace.log`

Coordinates: - All coordinate-based commands (press, long-press, focus, fill) use device coordinates with origin at top-left. - X increases to the right, Y increases downward.

`Command Index`

open, close, home, back, app-switcher

snapshot, find, get

click, focus, type, fill, press, long-press, scroll, scrollintoview, is

alert, wait, screenshot

trace start, trace stop

settings wifi|airplane|location on|off

appstate, apps, devices, session list


Backends (iOS snapshots)

Notes: - Default backend isxcteston iOS. - Scope snapshots with-s "" or -s @ref. - If XCTest returns 0 nodes (e.g., foreground app changed), agent-device falls back to AX when available.

`Skills`


Install the automation skills listed in SKILL.md.

`bash npx skills add https://github.com/callstackincubator/agent-device --skill agent-device`

Assertions: -is predicates: visible, hidden, exists, editable, selected, text. -is text uses exact equality.

Update examples:

`sh

`Before (stale selector)`


click "id=\"old_continue\" || label=\"Continue\""
After replay -u (rewritten in place)

click "id=\"auth_continue\" || label=\"Continue\""

`sh

`Before (ref-based action from discovery)`


snapshot -i -c -s "Continue"
click @e13 "Continue"
After replay -u (upgraded to selector-based action)

snapshot -i -c -s "Continue"
click "id=\"auth_continue\" || label=\"Continue\""

`Debug`

`App resolution`


- Bundle/package identifiers are accepted directly (e.g.,

com.apple.Preferences

).
- Human-readable names are resolved when possible (e.g.,

Settings

).
- Built-in aliases include

Settings

 for both platforms.
iOS notes

- Input commands (

press, type, scroll

, etc.) are supported only on simulators in v1 and use the XCTest runner.
-

alert and scrollintoview

 use the XCTest runner and are simulator-only in v1.
- Real device support (including snapshots) is on the roadmap for iOS.
Testing

`bash pnpm test`

`Build`

`bash pnpm build`

Environment selectors: -ANDROID_DEVICE=Pixel_9_Pro_XL or ANDROID_SERIAL=emulator-5554-IOS_DEVICE="iPhone 17 Pro" or IOS_UDID=

Test screenshots are written to: -test/screenshots/android-settings.png-test/screenshots/ios-settings.png

`Contributing`

See

CONTRIBUTING.md`.