Headless browser automation CLI for AI agents
npm install agent-browserHeadless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
``bash`
npm install -g agent-browser
agent-browser install # Download Chromium
`bash`
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native # Requires Rust (https://rustup.rs)
pnpm link --global # Makes agent-browser available globally
agent-browser install
On Linux, install system dependencies:
`bash`
agent-browser install --with-depsor manually: npx playwright install-deps chromium
`bash`
agent-browser open example.com
agent-browser snapshot # Get accessibility tree with refs
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1 # Get text by ref
agent-browser screenshot page.png
agent-browser close
`bash`
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
`bash`
agent-browser open
agent-browser click
agent-browser dblclick
agent-browser focus
agent-browser type
agent-browser fill
agent-browser press
agent-browser keydown
agent-browser keyup
agent-browser hover
agent-browser select
agent-browser check
agent-browser uncheck
agent-browser scroll
agent-browser scrollintoview
agent-browser drag
agent-browser upload
agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
agent-browser pdf
agent-browser snapshot # Accessibility tree with refs (best for AI)
agent-browser eval
agent-browser connect
agent-browser close # Close browser (aliases: quit, exit)
`bash`
agent-browser get text
agent-browser get html
agent-browser get value
agent-browser get attr
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count
agent-browser get box
`bash`
agent-browser is visible
agent-browser is enabled
agent-browser is checked
`bash`
agent-browser find role
agent-browser find text
agent-browser find label
Actions: click, fill, check, hover, text
Examples:
`bash`
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
`bash`
agent-browser wait
agent-browser wait
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --url "**/dash" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.ready === true" # Wait for JS condition
Load states: load, domcontentloaded, networkidle
`bash`
agent-browser mouse move
agent-browser mouse down [button] # Press button (left/right/middle)
agent-browser mouse up [button] # Release button
agent-browser mouse wheel
` # HTTP basic authbash`
agent-browser set viewport
agent-browser set device
agent-browser set geo
agent-browser set offline [on|off] # Toggle offline mode
agent-browser set headers
agent-browser set credentials
agent-browser set media [dark|light] # Emulate color scheme
`bash
agent-browser cookies # Get all cookies
agent-browser cookies set
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local
agent-browser storage local set
agent-browser storage local clear # Clear all
agent-browser storage session # Same for sessionStorage
`
`bash`
agent-browser network route
agent-browser network route
agent-browser network route
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter requests
`bash`
agent-browser tab # List tabs
agent-browser tab new [url] # New tab (optionally with URL)
agent-browser tab
agent-browser tab close [n] # Close tab
agent-browser window new # New window
`bash`
agent-browser frame
agent-browser frame main # Back to main frame
`bash`
agent-browser dialog accept [text] # Accept (with optional prompt text)
agent-browser dialog dismiss # Dismiss
`bash`
agent-browser trace start [path] # Start recording trace
agent-browser trace stop [path] # Stop and save trace
agent-browser console # View console messages (log, error, warn, info)
agent-browser console --clear # Clear console
agent-browser errors # View page errors (uncaught JavaScript exceptions)
agent-browser errors --clear # Clear errors
agent-browser highlight
agent-browser state save
agent-browser state load
`bash`
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
`bash`
agent-browser install # Download Chromium browser
agent-browser install --with-deps # Also install system deps (Linux)
Run multiple isolated browser instances:
`bashDifferent sessions
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
Each session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Persistent Profiles
By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use
--profile to persist state across browser restarts:`bash
Use a persistent profile directory
agent-browser --profile ~/.myapp-profile open myapp.comLogin once, then reuse the authenticated session
agent-browser --profile ~/.myapp-profile open myapp.com/dashboardOr via environment variable
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
`The profile directory stores:
- Cookies and localStorage
- IndexedDB data
- Service workers
- Browser cache
- Login sessions
Tip: Use different profile paths for different projects to keep their browser state isolated.
Snapshot Options
The
snapshot command supports filtering to reduce output size:`bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
`| Option | Description |
|--------|-------------|
|
-i, --interactive | Only show interactive elements (buttons, links, inputs) |
| -C, --cursor | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
| -c, --compact | Remove empty structural elements |
| -d, --depth | Limit tree depth |
| -s, --selector | Scope to CSS selector |The
-C flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.Options
| Option | Description |
|--------|-------------|
|
--session | Use isolated session (or AGENT_BROWSER_SESSION env) |
| --profile | Persistent browser profile directory (or AGENT_BROWSER_PROFILE env) |
| --headers | Set HTTP headers scoped to the URL's origin |
| --executable-path | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) |
| --args | Browser launch args, comma or newline separated (or AGENT_BROWSER_ARGS env) |
| --user-agent | Custom User-Agent string (or AGENT_BROWSER_USER_AGENT env) |
| --proxy | Proxy server URL with optional auth (or AGENT_BROWSER_PROXY env) |
| --proxy-bypass | Hosts to bypass proxy (or AGENT_BROWSER_PROXY_BYPASS env) |
| -p, --provider | Cloud browser provider (or AGENT_BROWSER_PROVIDER env) |
| --json | JSON output (for agents) |
| --full, -f | Full page screenshot |
| --name, -n | Locator name filter |
| --exact | Exact text match |
| --headed | Show browser window (not headless) |
| --cdp | Connect via Chrome DevTools Protocol |
| --ignore-https-errors | Ignore HTTPS certificate errors (useful for self-signed certs) |
| --allow-file-access | Allow file:// URLs to access local files (Chromium only) |
| --debug | Debug output |Selectors
$3
Refs provide deterministic element selection from snapshots:
`bash
1. Get snapshot with refs
agent-browser snapshot
Output:
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]
2. Use refs to interact
agent-browser click @e2 # Click the button
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1 # Get heading text
agent-browser hover @e4 # Hover the link
`Why use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
$3
`bash
agent-browser click "#id"
agent-browser click ".class"
agent-browser click "div > button"
`$3
`bash
agent-browser click "text=Submit"
agent-browser click "xpath=//button"
`$3
`bash
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
`Agent Mode
Use
--json for machine-readable output:`bash
agent-browser snapshot --json
Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser get text @e1 --json
agent-browser is visible @e2 --json
`$3
`bash
1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json # AI parses tree and refs2. AI identifies target refs from snapshot
3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"4. Get new snapshot if page changed
agent-browser snapshot -i --json
`Headed Mode
Show the browser window for debugging:
`bash
agent-browser open example.com --headed
`This opens a visible browser window instead of running headless.
Authenticated Sessions
Use
--headers to set HTTP headers for a specific origin, enabling authentication without login flows:`bash
Headers are scoped to api.example.com only
agent-browser open api.example.com --headers '{"Authorization": "Bearer "}'Requests to api.example.com include the auth header
agent-browser snapshot -i --json
agent-browser click @e2Navigate to another domain - headers are NOT sent (safe!)
agent-browser open other-site.com
`This is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains
To set headers for multiple origins, use
--headers with each open command:`bash
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
`For global headers (all domains), use
set headers:`bash
agent-browser set headers '{"X-Custom-Header": "value"}'
`Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
- Serverless deployment: Use lightweight Chromium builds like
@sparticuz/chromium (~50MB vs ~684MB)
- System browsers: Use an existing Chrome/Chromium installation
- Custom builds: Use modified browser builds$3
`bash
Via flag
agent-browser --executable-path /path/to/chromium open example.comVia environment variable
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
`$3
`typescript
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... use browser
}
`Local Files
Open and interact with local files (PDFs, HTML, etc.) using
file:// URLs:`bash
Enable file access (required for JavaScript to access local files)
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.htmlTake screenshot of a local PDF
agent-browser --allow-file-access open file:///Users/me/report.pdf
agent-browser screenshot report.png
`The
--allow-file-access flag adds Chromium flags (--allow-file-access-from-files, --allow-file-access) that allow file:// URLs to:
- Load and render local files
- Access other local files via JavaScript (XHR, fetch)
- Load local resources (images, scripts, stylesheets)Note: This flag only works with Chromium. For security, it's disabled by default.
CDP Mode
Connect to an existing browser via Chrome DevTools Protocol:
`bash
Start Chrome with: google-chrome --remote-debugging-port=9222
Connect once, then run commands without --cdp
agent-browser connect 9222
agent-browser snapshot
agent-browser tab
agent-browser closeOr pass --cdp on each command
agent-browser --cdp 9222 snapshotConnect to remote browser via WebSocket URL
agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
`The
--cdp flag accepts either:
- A port number (e.g., 9222) for local connections via http://localhost:{port}
- A full WebSocket URL (e.g., wss://... or ws://...) for remote browser servicesThis enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
$3
Set the
AGENT_BROWSER_STREAM_PORT environment variable:`bash
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
`This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
$3
Connect to
ws://localhost:9223 to receive frames and send input:Receive frames:
`json
{
"type": "frame",
"data": "",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}
`Send mouse events:
`json
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}
`Send keyboard events:
`json
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}
`Send touch events:
`json
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}
`$3
For advanced use, control streaming directly via the protocol:
`typescript
import { BrowserManager } from 'agent-browser';const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast
await browser.startScreencast((frame) => {
// frame.data is base64-encoded image
// frame.metadata contains viewport info
console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
});
// Inject mouse events
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
});
// Inject keyboard events
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter',
});
// Stop when done
await browser.stopScreencast();
`Architecture
agent-browser uses a client-daemon architecture:
1. Rust CLI (fast native binary) - Parses commands, communicates with daemon
2. Node.js Daemon - Manages Playwright browser instance
3. Fallback - If native binary unavailable, uses Node.js directly
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
Platforms
| Platform | Binary | Fallback |
|----------|--------|----------|
| macOS ARM64 | Native Rust | Node.js |
| macOS x64 | Native Rust | Node.js |
| Linux ARM64 | Native Rust | Node.js |
| Linux x64 | Native Rust | Node.js |
| Windows x64 | Native Rust | Node.js |
Usage with AI Agents
$3
The simplest approach - just tell your agent to use it:
`
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
`The
--help output is comprehensive and most agents can figure it out from there.$3
Add the skill to your AI coding assistant for richer context:
`bash
npx skills add vercel-labs/agent-browser
`This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf.
$3
For more consistent results, add to your project or global instructions file:
`markdown
Browser Automation
Use
agent-browser for web automation. Run agent-browser --help for all commands.Core workflow:
1.
agent-browser open - Navigate to page
2. agent-browser snapshot -i - Get interactive elements with refs (@e1, @e2)
3. agent-browser click @e1 / fill @e2 "text" - Interact using refs
4. Re-snapshot after page changes
`Integrations
$3
Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
Setup:
`bash
Install Appium and XCUITest driver
npm install -g appium
appium driver install xcuitest
`Usage:
`bash
List available iOS simulators
agent-browser device listLaunch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.comSame commands as desktop
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios fill @e2 "text"
agent-browser -p ios screenshot mobile.pngMobile-specific commands
agent-browser -p ios swipe up
agent-browser -p ios swipe down 500Close session
agent-browser -p ios close
`Or use environment variables:
`bash
export AGENT_BROWSER_PROVIDER=ios
export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
agent-browser open https://example.com
`| Variable | Description |
|----------|-------------|
|
AGENT_BROWSER_PROVIDER | Set to ios to enable iOS mode |
| AGENT_BROWSER_IOS_DEVICE | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
| AGENT_BROWSER_IOS_UDID | Device UDID (alternative to device name) |Supported devices: All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
Note: The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
#### Real Device Support
Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1. Get your device UDID:
`bash
xcrun xctrace list devices
or
system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
`2. Sign WebDriverAgent (one-time):
`bash
Open the WebDriverAgent Xcode project
cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
open WebDriverAgent.xcodeproj
`In Xcode:
- Select the
WebDriverAgentRunner target
- Go to Signing & Capabilities
- Select your Team (requires Apple Developer account, free tier works)
- Let Xcode manage signing automatically3. Use with agent-browser:
`bash
Connect device via USB, then:
agent-browser -p ios --device "" open https://example.comOr use the device name if unique
agent-browser -p ios --device "John's iPhone" open https://example.com
`Real device notes:
- First run installs WebDriverAgent to the device (may require Trust prompt)
- Device must be unlocked and connected via USB
- Slightly slower initial connection than simulator
- Tests against real Safari performance and behavior
$3
Browserbase provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
To enable Browserbase, use the
-p flag:`bash
export BROWSERBASE_API_KEY="your-api-key"
export BROWSERBASE_PROJECT_ID="your-project-id"
agent-browser -p browserbase open https://example.com
`Or use environment variables for CI/scripts:
`bash
export AGENT_BROWSER_PROVIDER=browserbase
export BROWSERBASE_API_KEY="your-api-key"
export BROWSERBASE_PROJECT_ID="your-project-id"
agent-browser open https://example.com
`When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
Get your API key and project ID from the Browserbase Dashboard.
$3
Browser Use provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
To enable Browser Use, use the
-p flag:`bash
export BROWSER_USE_API_KEY="your-api-key"
agent-browser -p browseruse open https://example.com
`Or use environment variables for CI/scripts:
`bash
export AGENT_BROWSER_PROVIDER=browseruse
export BROWSER_USE_API_KEY="your-api-key"
agent-browser open https://example.com
`When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
Get your API key from the Browser Use Cloud Dashboard. Free credits are available to get started, with pay-as-you-go pricing after.
$3
Kernel provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
To enable Kernel, use the
-p flag:`bash
export KERNEL_API_KEY="your-api-key"
agent-browser -p kernel open https://example.com
`Or use environment variables for CI/scripts:
`bash
export AGENT_BROWSER_PROVIDER=kernel
export KERNEL_API_KEY="your-api-key"
agent-browser open https://example.com
`Optional configuration via environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
|
KERNEL_HEADLESS | Run browser in headless mode (true/false) | false |
| KERNEL_STEALTH | Enable stealth mode to avoid bot detection (true/false) | true |
| KERNEL_TIMEOUT_SECONDS | Session timeout in seconds | 300 |
| KERNEL_PROFILE_NAME | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
Profile Persistence: When
KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.Get your API key from the Kernel Dashboard.
Apache-2.0