Headless browser automation CLI for AI agents with CDP network capture
npm install agent-browser-cdpnetworkHeadless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
``bash`
npm install -g agent-browser
agent-browser install # Download Chromium
`bash`
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native # Requires Rust (https://rustup.rs)
pnpm link --global # Makes agent-browser available globally
agent-browser install
On Linux, install system dependencies:
`bash`
agent-browser install --with-depsor manually: npx playwright install-deps chromium
`bash`
agent-browser open example.com
agent-browser snapshot # Get accessibility tree with refs
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1 # Get text by ref
agent-browser screenshot page.png
agent-browser close
`bash`
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
`bash`
agent-browser open
agent-browser click
agent-browser dblclick
agent-browser focus
agent-browser type
agent-browser fill
agent-browser press
agent-browser keydown
agent-browser keyup
agent-browser hover
agent-browser select
agent-browser check
agent-browser uncheck
agent-browser scroll
agent-browser scrollintoview
agent-browser drag
agent-browser upload
agent-browser screenshot [path] # Take screenshot (--full for full page)
agent-browser pdf
agent-browser snapshot # Accessibility tree with refs (best for AI)
agent-browser eval
agent-browser close # Close browser (aliases: quit, exit)
`bash`
agent-browser get text
agent-browser get html
agent-browser get value
agent-browser get attr
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count
agent-browser get box
`bash`
agent-browser is visible
agent-browser is enabled
agent-browser is checked
`bash`
agent-browser find role
agent-browser find text
agent-browser find label
Actions: click, fill, check, hover, text
Examples:
`bash`
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
`bash`
agent-browser wait
agent-browser wait
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --url "**/dash" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.ready === true" # Wait for JS condition
Load states: load, domcontentloaded, networkidle
`bash`
agent-browser mouse move
agent-browser mouse down [button] # Press button (left/right/middle)
agent-browser mouse up [button] # Release button
agent-browser mouse wheel
` # HTTP basic authbash`
agent-browser set viewport
agent-browser set device
agent-browser set geo
agent-browser set offline [on|off] # Toggle offline mode
agent-browser set headers
agent-browser set credentials
agent-browser set media [dark|light] # Emulate color scheme
`bash
agent-browser cookies # Get all cookies
agent-browser cookies set
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local
agent-browser storage local set
agent-browser storage local clear # Clear all
agent-browser storage session # Same for sessionStorage
`
`bashCDP Network Capture (full traffic with response bodies)
agent-browser network capture start # Enable capture
agent-browser network capture stop # Disable capture
agent-browser network list # List captured traffic
agent-browser network list --url "api" # Filter by URL pattern
agent-browser network get
agent-browser network body
agent-browser network search "pattern" # Search traffic
agent-browser network fetch
agent-browser network clear # Clear captured traffic
$3
`bash
agent-browser tab # List tabs
agent-browser tab new [url] # New tab (optionally with URL)
agent-browser tab # Switch to tab n
agent-browser tab close [n] # Close tab
agent-browser window new # New window
`$3
`bash
agent-browser frame # Switch to iframe
agent-browser frame main # Back to main frame
`$3
`bash
agent-browser dialog accept [text] # Accept (with optional prompt text)
agent-browser dialog dismiss # Dismiss
`$3
`bash
agent-browser trace start [path] # Start recording trace
agent-browser trace stop [path] # Stop and save trace
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight # Highlight element
agent-browser state save # Save auth state
agent-browser state load # Load auth state
`$3
`bash
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
`$3
`bash
agent-browser install # Download Chromium browser
agent-browser install --with-deps # Also install system deps (Linux)
`Sessions
Run multiple isolated browser instances:
`bash
Different sessions
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.comOr via environment variable
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"List active sessions
agent-browser session list
Output:
Active sessions:
-> default
agent1
Show current session
agent-browser session
`Each session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Snapshot Options
The
snapshot command supports filtering to reduce output size:`bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
`| Option | Description |
|--------|-------------|
|
-i, --interactive | Only show interactive elements (buttons, links, inputs) |
| -c, --compact | Remove empty structural elements |
| -d, --depth | Limit tree depth |
| -s, --selector | Scope to CSS selector |Options
| Option | Description |
|--------|-------------|
|
--session | Use isolated session (or AGENT_BROWSER_SESSION env) |
| --headers | Set HTTP headers scoped to the URL's origin |
| --executable-path | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) |
| --json | JSON output (for agents) |
| --full, -f | Full page screenshot / full untruncated network data |
| --truncatejsonvalues | Truncate JSON string values to n chars (default: 500) |
| --name, -n | Locator name filter |
| --exact | Exact text match |
| --headed | Show browser window (not headless) |
| --cdp | Connect via Chrome DevTools Protocol |
| --debug | Debug output |Selectors
$3
Refs provide deterministic element selection from snapshots:
`bash
1. Get snapshot with refs
agent-browser snapshot
Output:
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]
2. Use refs to interact
agent-browser click @e2 # Click the button
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1 # Get heading text
agent-browser hover @e4 # Hover the link
`Why use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
$3
`bash
agent-browser click "#id"
agent-browser click ".class"
agent-browser click "div > button"
`$3
`bash
agent-browser click "text=Submit"
agent-browser click "xpath=//button"
`$3
`bash
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
`Agent Mode
Use
--json for machine-readable output:`bash
agent-browser snapshot --json
Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser get text @e1 --json
agent-browser is visible @e2 --json
`$3
`bash
1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json # AI parses tree and refs2. AI identifies target refs from snapshot
3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"4. Get new snapshot if page changed
agent-browser snapshot -i --json
`Headed Mode
Show the browser window for debugging:
`bash
agent-browser open example.com --headed
`This opens a visible browser window instead of running headless.
Authenticated Sessions
Use
--headers to set HTTP headers for a specific origin, enabling authentication without login flows:`bash
Headers are scoped to api.example.com only
agent-browser open api.example.com --headers '{"Authorization": "Bearer "}'Requests to api.example.com include the auth header
agent-browser snapshot -i --json
agent-browser click @e2Navigate to another domain - headers are NOT sent (safe!)
agent-browser open other-site.com
`This is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains
To set headers for multiple origins, use
--headers with each open command:`bash
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
`For global headers (all domains), use
set headers:`bash
agent-browser set headers '{"X-Custom-Header": "value"}'
`Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
- Serverless deployment: Use lightweight Chromium builds like
@sparticuz/chromium (~50MB vs ~684MB)
- System browsers: Use an existing Chrome/Chromium installation
- Custom builds: Use modified browser builds$3
`bash
Via flag
agent-browser --executable-path /path/to/chromium open example.comVia environment variable
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
`$3
`typescript
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... use browser
}
`CDP Mode
Connect to an existing browser via Chrome DevTools Protocol:
`bash
Connect to Electron app
agent-browser --cdp 9222 snapshotConnect to Chrome with remote debugging
(Start Chrome with: google-chrome --remote-debugging-port=9222)
agent-browser --cdp 9222 open about:blank
`This enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
$3
Set the
AGENT_BROWSER_STREAM_PORT environment variable:`bash
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
`This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
$3
Connect to
ws://localhost:9223 to receive frames and send input:Receive frames:
`json
{
"type": "frame",
"data": "",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}
`Send mouse events:
`json
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}
`Send keyboard events:
`json
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}
`Send touch events:
`json
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}
`$3
For advanced use, control streaming directly via the protocol:
`typescript
import { BrowserManager } from 'agent-browser';const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast
await browser.startScreencast((frame) => {
// frame.data is base64-encoded image
// frame.metadata contains viewport info
console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
});
// Inject mouse events
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
});
// Inject keyboard events
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter',
});
// Stop when done
await browser.stopScreencast();
`Architecture
agent-browser uses a client-daemon architecture:
1. Rust CLI (fast native binary) - Parses commands, communicates with daemon
2. Node.js Daemon - Manages Playwright browser instance
3. Fallback - If native binary unavailable, uses Node.js directly
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
Platforms
| Platform | Binary | Fallback |
|----------|--------|----------|
| macOS ARM64 | Native Rust | Node.js |
| macOS x64 | Native Rust | Node.js |
| Linux ARM64 | Native Rust | Node.js |
| Linux x64 | Native Rust | Node.js |
| Windows x64 | Native Rust | Node.js |
Usage with AI Agents
$3
The simplest approach - just tell your agent to use it:
`
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
`The
--help output is comprehensive and most agents can figure it out from there.$3
For more consistent results, add to your project or global instructions file:
`markdown
Browser Automation
Use
agent-browser for web automation. Run agent-browser --help for all commands.Core workflow:
1.
agent-browser open - Navigate to page
2. agent-browser snapshot -i - Get interactive elements with refs (@e1, @e2)
3. agent-browser click @e1 / fill @e2 "text" - Interact using refs
4. Re-snapshot after page changes
`$3
For Claude Code, a skill provides richer context:
`bash
cp -r node_modules/agent-browser/skills/agent-browser .claude/skills/
`Or download:
`bash
mkdir -p .claude/skills/agent-browser
curl -o .claude/skills/agent-browser/SKILL.md \
https://raw.githubusercontent.com/vercel-labs/agent-browser/main/skills/agent-browser/SKILL.md
``Apache-2.0