Mobile Automation with agent-device
For exploration, use snapshot refs. For deterministic replay, use selectors.
For structured exploratory QA bug hunts and reporting, use ../dogfood/SKILL.md.
Start Here (Read This First)
Use this skill as a router, not a full manual.
- Pick one mode:
- Normal interaction flow
- Debug/crash flow
- Replay maintenance flow
- Run one canonical flow below.
- Open references only if blocked.
Decision Map
- No target context yet:
devices -> pick target -> open.
- Normal UI task:
open -> snapshot -i -> press/fill -> diff snapshot -i -> close
- Debug/crash:
open <app> -> logs clear --restart -> reproduce -> network dump -> logs path -> targeted grep
- Replay drift:
replay -u <path> -> verify updated selectors
- Remote multi-tenant run: allocate lease -> point client at remote daemon base URL -> run commands with tenant isolation flags -> heartbeat/release lease
- Device-scope isolation run: set iOS simulator set / Android allowlist -> run selectors within scope only
Canonical Flows
1) Normal Interaction Flow
bash
1agent-device open Settings --platform ios
2agent-device snapshot -i
3agent-device press @e3
4agent-device diff snapshot -i
5agent-device fill @e5 "test"
6agent-device close
2) Debug/Crash Flow
bash
1agent-device open MyApp --platform ios
2agent-device logs clear --restart
3agent-device network dump 25
4agent-device logs path
Logging is off by default. Enable only for debugging windows.
logs clear --restart requires an active app session (open <app> first).
3) Replay Maintenance Flow
bash
1agent-device replay -u ./session.ad
4) Remote Tenant Lease Flow (HTTP JSON-RPC)
bash
1# Client points directly at the remote daemon HTTP base URL.
2export AGENT_DEVICE_DAEMON_BASE_URL=http://mac-host.example:4310
3export AGENT_DEVICE_DAEMON_AUTH_TOKEN=<token>
4
5# Allocate lease
6curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
7 -H "content-type: application/json" \
8 -H "Authorization: Bearer <token>" \
9 -d '{"jsonrpc":"2.0","id":"alloc-1","method":"agent_device.lease.allocate","params":{"runId":"run-123","tenantId":"acme","ttlMs":60000}}'
10
11# Use lease in tenant-isolated command execution
12agent-device \
13 --tenant acme \
14 --session-isolation tenant \
15 --run-id run-123 \
16 --lease-id <lease-id> \
17 session list --json
18
19# Heartbeat and release
20curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
21 -H "content-type: application/json" \
22 -H "Authorization: Bearer <token>" \
23 -d '{"jsonrpc":"2.0","id":"hb-1","method":"agent_device.lease.heartbeat","params":{"leaseId":"<lease-id>","ttlMs":60000}}'
24curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
25 -H "content-type: application/json" \
26 -H "Authorization: Bearer <token>" \
27 -d '{"jsonrpc":"2.0","id":"rel-1","method":"agent_device.lease.release","params":{"leaseId":"<lease-id>"}}'
Notes:
AGENT_DEVICE_DAEMON_BASE_URL makes the CLI skip local daemon discovery/startup and call the remote HTTP daemon directly.
AGENT_DEVICE_DAEMON_AUTH_TOKEN is sent in both the JSON-RPC request token and HTTP auth headers.
- In remote daemon mode,
--debug does not tail a local daemon.log; inspect logs on the remote host instead.
Command Skeleton (Minimal)
Session and navigation
bash
1agent-device devices
2agent-device devices --platform ios --ios-simulator-device-set /tmp/tenant-a/simulators
3agent-device devices --platform android --android-device-allowlist emulator-5554,device-1234
4agent-device ensure-simulator --device "iPhone 16" --ios-simulator-device-set /tmp/tenant-a/simulators
5agent-device ensure-simulator --device "iPhone 16" --runtime com.apple.CoreSimulator.SimRuntime.iOS-18-4 --ios-simulator-device-set /tmp/tenant-a/simulators --boot
6agent-device open [app|url] [url]
7agent-device open [app] --relaunch
8agent-device close [app]
9agent-device install <app> <path-to-binary>
10agent-device reinstall <app> <path-to-binary>
11agent-device session list
Use boot only as fallback when open cannot find/connect to a ready target.
For Android emulators by AVD name, use boot --platform android --device <avd-name>.
For Android emulators without GUI, add --headless.
Use --target mobile|tv with --platform (required) to pick phone/tablet vs TV targets (AndroidTV/tvOS).
Isolation scoping quick reference:
--ios-simulator-device-set <path> scopes iOS simulator discovery + command execution to one simulator set.
--android-device-allowlist <serials> scopes Android discovery/selection to comma/space separated serials.
- Scope is applied before selectors (
--device, --udid, --serial); out-of-scope selectors fail with DEVICE_NOT_FOUND.
- With iOS simulator-set scope enabled, iOS physical devices are not enumerated.
Simulator provisioning quick reference:
- Use
ensure-simulator to create or reuse a named iOS simulator inside a device set before starting a session.
--device <name> is required (e.g. "iPhone 16 Pro"). --runtime <id> pins the runtime; omit to use the newest compatible one.
--boot boots it immediately. Returns udid, device, runtime, ios_simulator_device_set, created, booted.
- Idempotent: safe to call repeatedly; reuses an existing matching simulator by default.
TV quick reference:
- AndroidTV:
open/apps use TV launcher discovery automatically.
- TV target selection works on emulators/simulators and connected physical devices (AndroidTV + AppleTV).
- tvOS: runner-driven interactions and snapshots are supported (
snapshot, wait, press, fill, get, scroll, back, home, app-switcher, record and related selector flows).
- tvOS
back/home/app-switcher map to Siri Remote actions (menu, home, double-home) in the runner.
- tvOS follows iOS simulator-only command semantics for helpers like
pinch, settings, and push.
Snapshot and targeting
bash
1agent-device snapshot -i
2agent-device diff snapshot -i
3agent-device find "Sign In" click
4agent-device press @e1
5agent-device fill @e2 "text"
6agent-device is visible 'id="anchor"'
press is canonical tap command; click is an alias.
Utilities
bash
1agent-device appstate
2agent-device clipboard read
3agent-device clipboard write "token"
4agent-device keyboard status
5agent-device keyboard dismiss
6agent-device perf --json
7agent-device network dump [limit] [summary|headers|body|all]
8agent-device push <bundle|package> <payload.json|inline-json>
9agent-device trigger-app-event screenshot_taken '{"source":"qa"}'
10agent-device get text @e1
11agent-device screenshot out.png
12agent-device settings permission grant notifications
13agent-device settings permission reset camera
14agent-device trace start
15agent-device trace stop ./trace.log
Batch (when sequence is already known)
bash
1agent-device batch --steps-file /tmp/batch-steps.json --json
- Use
agent-device perf --json (or metrics --json) after open.
- For detailed metric semantics, caveats, and interpretation guidance, see references/perf-metrics.md.
Guardrails (High Value Only)
- Re-snapshot after UI mutations (navigation/modal/list changes).
- Prefer
snapshot -i; scope/depth only when needed.
- Use refs for discovery, selectors for replay/assertions.
find "<query>" click --json returns { ref, locator, query, x, y } — all derived from the matched snapshot node. Do not rely on these fields from raw press/click responses for observability; use find instead.
- Use
fill for clear-then-type semantics; use type for focused append typing.
- Use
install for in-place app upgrades (keep app data when platform permits), and reinstall for deterministic fresh-state runs.
- App binary format support for
install/reinstall: Android .apk/.aab, iOS .app/.ipa.
- Android
.aab requires bundletool in PATH, or AGENT_DEVICE_BUNDLETOOL_JAR=<path-to-bundletool-all.jar> with java in PATH.
- Android
.aab optional: set AGENT_DEVICE_ANDROID_BUNDLETOOL_MODE=<mode> to control bundletool build-apks --mode (default: universal).
- iOS
.ipa: extract/install from Payload/*.app; when multiple app bundles are present, <app> is used as a bundle id/name hint.
- iOS
appstate is session-scoped; Android appstate is live foreground state. iOS responses include device_udid and ios_simulator_device_set for isolation verification.
- iOS
open responses include device_udid and ios_simulator_device_set to confirm which simulator handled the session.
- Clipboard helpers:
clipboard read / clipboard write <text> are supported on Android and iOS simulators; iOS physical devices are not supported yet.
- Android keyboard helpers:
keyboard status|get|dismiss report keyboard visibility/type and dismiss via keyevent when visible.
network dump is best-effort and parses HTTP(s) entries from the session app log file.
- Biometric settings: iOS simulator supports
settings faceid|touchid <match|nonmatch|enroll|unenroll>; Android supports settings fingerprint <match|nonmatch> where runtime tooling is available.
- For AndroidTV/tvOS selection, always pair
--target with --platform (ios, android, or apple alias); target-only selection is invalid.
push simulates notification delivery:
- iOS simulator uses APNs-style payload JSON.
- Android uses broadcast action + typed extras (string/boolean/number).
trigger-app-event requires app-defined deep-link hooks and URL template configuration (AGENT_DEVICE_APP_EVENT_URL_TEMPLATE or platform-specific variants).
trigger-app-event requires an active session or explicit selectors (--platform, --device, --udid, --serial); on iOS physical devices, custom-scheme triggers require active app context.
- Canonical trigger behavior and caveats are documented in
website/docs/docs/commands.md under App event triggers.
- Permission settings are app-scoped and require an active session app:
settings permission <grant|deny|reset> <camera|microphone|photos|contacts|notifications> [full|limited]
- iOS simulator permission alerts: use
alert wait then alert accept/dismiss — accept/dismiss retry internally for up to 2 s so you do not need manual sleeps. See references/permissions.md.
full|limited mode applies only to iOS photos; other targets reject mode.
- On Android, non-ASCII
fill/type may require an ADB keyboard IME on some system images; only install IME APKs from trusted sources and verify checksum/signature.
- If using
--save-script, prefer explicit path syntax (--save-script=flow.ad or ./flow.ad).
- For tenant-isolated remote runs, always pass
--tenant, --session-isolation tenant, --run-id, and --lease-id together.
- Use short lease TTLs and heartbeat only while work is active; release leases immediately after run completion/failure.
- Env equivalents for scoped runs:
AGENT_DEVICE_IOS_SIMULATOR_DEVICE_SET (compat IOS_SIMULATOR_DEVICE_SET) and
AGENT_DEVICE_ANDROID_DEVICE_ALLOWLIST (compat ANDROID_DEVICE_ALLOWLIST).
- For explicit remote client mode, prefer
AGENT_DEVICE_DAEMON_BASE_URL / --daemon-base-url instead of relying on local daemon metadata or loopback-only ports.
Security and Trust Notes
- Prefer a preinstalled
agent-device binary over on-demand package execution.
- If install is required, pin an exact version (for example:
npx --yes agent-device@<exact-version> --help).
- Signing/provisioning environment variables are optional, sensitive, and only for iOS physical-device setup.
- Logs/artifacts are written under
~/.agent-device; replay scripts write to explicit paths you provide.
- For remote daemon mode, prefer
AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual on the host plus client-side AGENT_DEVICE_DAEMON_BASE_URL, with AGENT_DEVICE_HTTP_AUTH_HOOK and tenant-scoped lease admission where needed.
- Keep logging off unless debugging and use least-privilege/isolated environments for autonomous runs.
Common Mistakes
- Mixing debug flow into normal runs (keep logs off unless debugging).
- Continuing to use stale refs after screen transitions.
- Using URL opens with Android
--activity (unsupported combination).
- Treating
boot as default first step instead of fallback.
References