agent-browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Installation

$3

``bash npm install -g agent-browser agent-browser install # Download Chromium`

`$3`

`bash git clone https://github.com/vercel-labs/agent-browser cd agent-browser pnpm install pnpm build pnpm build:native # Requires Rust (https://rustup.rs) pnpm link --global # Makes agent-browser available globally agent-browser install`

`$3`

On Linux, install system dependencies:

`bash agent-browser install --with-deps

`or manually: npx playwright install-deps chromium`


Quick Start

`bash agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close`

`$3`

`bash agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit"`

`Commands`

`$3`

`bash agent-browser open # Navigate to URL (aliases: goto, navigate) agent-browser click # Click element agent-browser dblclick # Double-click element agent-browser focus # Focus element agent-browser type # Type into element agent-browser fill # Clear and fill agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keydown # Hold key down agent-browser keyup # Release key agent-browser hover # Hover element agent-browser select # Select dropdown option agent-browser check # Check checkbox agent-browser uncheck # Uncheck checkbox agent-browser scroll

 [px]       # Scroll (up/down/left/right)
agent-browser scrollintoview     # Scroll element into view (alias: scrollinto)
agent-browser drag          # Drag and drop
agent-browser upload      # Upload files
agent-browser screenshot [path]       # Take screenshot (--full for full page)
agent-browser pdf               # Save as PDF
agent-browser snapshot                # Accessibility tree with refs (best for AI)
agent-browser eval                # Run JavaScript
agent-browser close                   # Close browser (aliases: quit, exit)

$3

`bash agent-browser get text # Get text content agent-browser get html # Get innerHTML agent-browser get value # Get input value agent-browser get attr # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count # Count matching elements agent-browser get box # Get bounding box`

`$3`

`bash agent-browser is visible # Check if visible agent-browser is enabled # Check if enabled agent-browser is checked # Check if checked`

`$3`

`bash agent-browser find role [value] # By ARIA role agent-browser find text # By text content agent-browser find label [value] # By label agent-browser find placeholder [value] # By placeholder agent-browser find alt # By alt text agent-browser find title # By title attr agent-browser find testid [value] # By data-testid agent-browser find first [value] # First match agent-browser find last [value] # Last match agent-browser find nth [value] # Nth match`

Actions: click, fill, check, hover, text

Examples:`bash agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "test@test.com" agent-browser find first ".item" click agent-browser find nth 2 "a" text`

`$3`

`bash agent-browser wait # Wait for element to be visible agent-browser wait # Wait for time (milliseconds) agent-browser wait --text "Welcome" # Wait for text to appear agent-browser wait --url "**/dash" # Wait for URL pattern agent-browser wait --load networkidle # Wait for load state agent-browser wait --fn "window.ready === true" # Wait for JS condition`

Load states: load, domcontentloaded, networkidle

`$3`

`bash agent-browser mouse move # Move mouse agent-browser mouse down [button] # Press button (left/right/middle) agent-browser mouse up [button] # Release button agent-browser mouse wheel [dx] # Scroll wheel`

`$3`

`bash agent-browser set viewport # Set viewport size agent-browser set device # Emulate device ("iPhone 14") agent-browser set geo # Set geolocation agent-browser set offline [on|off] # Toggle offline mode agent-browser set headers # Extra HTTP headers agent-browser set credentials

# HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme`

`$3`

`bash agent-browser cookies # Get all cookies agent-browser cookies set # Set cookie agent-browser cookies clear # Clear cookies

agent-browser storage local # Get all localStorage agent-browser storage local # Get specific key agent-browser storage local set # Set value agent-browser storage local clear # Clear all

agent-browser storage session # Same for sessionStorage`

`$3`

`bash

`CDP Network Capture (full traffic with response bodies)`


agent-browser network capture start           # Enable capture
agent-browser network capture stop            # Disable capture
agent-browser network list                    # List captured traffic
agent-browser network list --url "api"        # Filter by URL pattern
agent-browser network get          # Get request/response details
agent-browser network body         # Get response body
agent-browser network search "pattern"        # Search traffic
agent-browser network fetch              # Make request with browser session
agent-browser network clear                   # Clear captured traffic
Truncation options (reduce output for LLM agents)

agent-browser network list --truncatejsonvalues 200  # Custom truncation threshold
agent-browser network body  --full               # Full untruncated data
Playwright Interception (for mocking/blocking)

agent-browser network route               # Intercept requests
agent-browser network route  --abort      # Block requests
agent-browser network route  --body   # Mock response
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --filter api    # Filter requests

$3

`bash agent-browser tab # List tabs agent-browser tab new [url] # New tab (optionally with URL) agent-browser tab # Switch to tab n agent-browser tab close [n] # Close tab agent-browser window new # New window`

`$3`

`bash agent-browser frame # Switch to iframe agent-browser frame main # Back to main frame`

`$3`

`bash agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss`

`$3`

`bash agent-browser trace start [path] # Start recording trace agent-browser trace stop [path] # Stop and save trace agent-browser console # View console messages agent-browser console --clear # Clear console agent-browser errors # View page errors agent-browser errors --clear # Clear errors agent-browser highlight # Highlight element agent-browser state save # Save auth state agent-browser state load # Load auth state`

`$3`

`bash agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page`

`$3`

`bash agent-browser install # Download Chromium browser agent-browser install --with-deps # Also install system deps (Linux)`

`Sessions`

Run multiple isolated browser instances:

`bash

`Different sessions`


agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
Or via environment variable

AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
List active sessions

agent-browser session list
Output:

Active sessions:

-> default

   agent1
Show current session

agent-browser session


Each session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Snapshot Options

The snapshot command supports filtering to reduce output size:

`bash agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 levels agent-browser snapshot -s "#main" # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options`

| Option | Description | |--------|-------------| |-i, --interactive| Only show interactive elements (buttons, links, inputs) | |-c, --compact| Remove empty structural elements | |-d, --depth | Limit tree depth | |-s, --selector | Scope to CSS selector |

`Options`

| Option | Description | |--------|-------------| |--session | Use isolated session (or AGENT_BROWSER_SESSIONenv) | |--headers | Set HTTP headers scoped to the URL's origin | |--executable-path | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATHenv) | |--json| JSON output (for agents) | |--full, -f| Full page screenshot / full untruncated network data | |--truncatejsonvalues | Truncate JSON string values to n chars (default: 500) | |--name, -n| Locator name filter | |--exact| Exact text match | |--headed| Show browser window (not headless) | |--cdp | Connect via Chrome DevTools Protocol | |--debug | Debug output |

`Selectors`

`$3`

Refs provide deterministic element selection from snapshots:

`bash

`1. Get snapshot with refs`


agent-browser snapshot
Output:

- heading "Example Domain" [ref=e1] [level=1]

- button "Submit" [ref=e2]

- textbox "Email" [ref=e3]

- link "Learn more" [ref=e4]
2. Use refs to interact

agent-browser click @e2                   # Click the button
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1                # Get heading text
agent-browser hover @e4                   # Hover the link


Why use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
$3

`bash agent-browser click "#id" agent-browser click ".class" agent-browser click "div > button"`

`$3`

`bash agent-browser click "text=Submit" agent-browser click "xpath=//button"`

`$3`

`bash agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com"`

`Agent Mode`

Use --json for machine-readable output:

`bash agent-browser snapshot --json

`Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}`

agent-browser get text @e1 --json agent-browser is visible @e2 --json`

`$3`

`bash

`1. Navigate and get snapshot`


agent-browser open example.com
agent-browser snapshot -i --json   # AI parses tree and refs
2. AI identifies target refs from snapshot

3. Execute actions using refs

agent-browser click @e2
agent-browser fill @e3 "input text"
4. Get new snapshot if page changed

agent-browser snapshot -i --json


Headed Mode
Show the browser window for debugging:

`bash agent-browser open example.com --headed`

This opens a visible browser window instead of running headless.

`Authenticated Sessions`

Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:

`bash

`Headers are scoped to api.example.com only`


agent-browser open api.example.com --headers '{"Authorization": "Bearer "}'
Requests to api.example.com include the auth header

agent-browser snapshot -i --json
agent-browser click @e2
Navigate to another domain - headers are NOT sent (safe!)

agent-browser open other-site.com


This is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains

To set headers for multiple origins, use --headers with each open command:

`bash agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'`

For global headers (all domains), use set headers:

`bash agent-browser set headers '{"X-Custom-Header": "value"}'`

`Custom Browser Executable`

Use a custom browser executable instead of the bundled Chromium. This is useful for: - Serverless deployment: Use lightweight Chromium builds like@sparticuz/chromium(~50MB vs ~684MB) - System browsers: Use an existing Chrome/Chromium installation - Custom builds: Use modified browser builds

`$3`

`bash

`Via flag`


agent-browser --executable-path /path/to/chromium open example.com
Via environment variable

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com

$3

`typescript import chromium from '@sparticuz/chromium'; import { BrowserManager } from 'agent-browser';

export async function handler() { const browser = new BrowserManager(); await browser.launch({ executablePath: await chromium.executablePath(), headless: true, }); // ... use browser }`

`CDP Mode`

Connect to an existing browser via Chrome DevTools Protocol:

`bash

`Connect to Electron app`


agent-browser --cdp 9222 snapshot
Connect to Chrome with remote debugging

(Start Chrome with: google-chrome --remote-debugging-port=9222)

agent-browser --cdp 9222 open about:blank


This enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
$3

Set the AGENT_BROWSER_STREAM_PORT environment variable:

`bash AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com`

This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.

`$3`

Connect to ws://localhost:9223 to receive frames and send input:

Receive frames:`json { "type": "frame", "data": "", "metadata": { "deviceWidth": 1280, "deviceHeight": 720, "pageScaleFactor": 1, "offsetTop": 0, "scrollOffsetX": 0, "scrollOffsetY": 0 } }`

Send mouse events:`json { "type": "input_mouse", "eventType": "mousePressed", "x": 100, "y": 200, "button": "left", "clickCount": 1 }`

Send keyboard events:`json { "type": "input_keyboard", "eventType": "keyDown", "key": "Enter", "code": "Enter" }`

Send touch events:`json { "type": "input_touch", "eventType": "touchStart", "touchPoints": [{ "x": 100, "y": 200 }] }`

`$3`

For advanced use, control streaming directly via the protocol:

`typescript import { BrowserManager } from 'agent-browser';

const browser = new BrowserManager(); await browser.launch({ headless: true }); await browser.navigate('https://example.com');

// Start screencast await browser.startScreencast((frame) => { // frame.data is base64-encoded image // frame.metadata contains viewport info console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight); }, { format: 'jpeg', quality: 80, maxWidth: 1280, maxHeight: 720, });

// Inject mouse events await browser.injectMouseEvent({ type: 'mousePressed', x: 100, y: 200, button: 'left', });

// Inject keyboard events await browser.injectKeyboardEvent({ type: 'keyDown', key: 'Enter', code: 'Enter', });

// Stop when done await browser.stopScreencast();`

`Architecture`

agent-browser uses a client-daemon architecture:

1. Rust CLI (fast native binary) - Parses commands, communicates with daemon 2. Node.js Daemon - Manages Playwright browser instance 3. Fallback - If native binary unavailable, uses Node.js directly

The daemon starts automatically on first command and persists between commands for fast subsequent operations.

Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.

`Platforms`

| Platform | Binary | Fallback | |----------|--------|----------| | macOS ARM64 | Native Rust | Node.js | | macOS x64 | Native Rust | Node.js | | Linux ARM64 | Native Rust | Node.js | | Linux x64 | Native Rust | Node.js | | Windows x64 | Native Rust | Node.js |

`Usage with AI Agents`

`$3`

The simplest approach - just tell your agent to use it:

`Use agent-browser to test the login flow. Run agent-browser --help to see available commands.`

The --help output is comprehensive and most agents can figure it out from there.

`$3`

For more consistent results, add to your project or global instructions file:

`markdown

`Browser Automation`

Use agent-browser for web automation. Run agent-browser --help for all commands.

Core workflow: 1.agent-browser open - Navigate to page 2.agent-browser snapshot -i- Get interactive elements with refs (@e1, @e2) 3.agent-browser click @e1 / fill @e2 "text"- Interact using refs 4. Re-snapshot after page changes`

`$3`

For Claude Code, a skill provides richer context:

`bash cp -r node_modules/agent-browser/skills/agent-browser .claude/skills/`

Or download:

`bash mkdir -p .claude/skills/agent-browser curl -o .claude/skills/agent-browser/SKILL.md \ https://raw.githubusercontent.com/vercel-labs/agent-browser/main/skills/agent-browser/SKILL.md``

License

Apache-2.0

agent-browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Installation

$3

``bash npm install -g agent-browser agent-browser install # Download Chromium`

`$3`

On Linux, install system dependencies:

`bash agent-browser install --with-deps

`or manually: npx playwright install-deps chromium`


Quick Start

`$3`

`bash agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit"`

`Commands`

`$3`

 [px]       # Scroll (up/down/left/right)
agent-browser scrollintoview     # Scroll element into view (alias: scrollinto)
agent-browser drag          # Drag and drop
agent-browser upload      # Upload files
agent-browser screenshot [path]       # Take screenshot (--full for full page)
agent-browser pdf               # Save as PDF
agent-browser snapshot                # Accessibility tree with refs (best for AI)
agent-browser eval                # Run JavaScript
agent-browser close                   # Close browser (aliases: quit, exit)

$3

`$3`

`bash agent-browser is visible # Check if visible agent-browser is enabled # Check if enabled agent-browser is checked # Check if checked`

`$3`

Actions: click, fill, check, hover, text

`$3`

Load states: load, domcontentloaded, networkidle

`$3`

# HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme`

`$3`

`bash agent-browser cookies # Get all cookies agent-browser cookies set # Set cookie agent-browser cookies clear # Clear cookies

agent-browser storage local # Get all localStorage agent-browser storage local # Get specific key agent-browser storage local set # Set value agent-browser storage local clear # Clear all

agent-browser storage session # Same for sessionStorage`

`$3`

`bash

`CDP Network Capture (full traffic with response bodies)`


agent-browser network capture start           # Enable capture
agent-browser network capture stop            # Disable capture
agent-browser network list                    # List captured traffic
agent-browser network list --url "api"        # Filter by URL pattern
agent-browser network get          # Get request/response details
agent-browser network body         # Get response body
agent-browser network search "pattern"        # Search traffic
agent-browser network fetch              # Make request with browser session
agent-browser network clear                   # Clear captured traffic
Truncation options (reduce output for LLM agents)

agent-browser network list --truncatejsonvalues 200  # Custom truncation threshold
agent-browser network body  --full               # Full untruncated data
Playwright Interception (for mocking/blocking)

agent-browser network route               # Intercept requests
agent-browser network route  --abort      # Block requests
agent-browser network route  --body   # Mock response
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --filter api    # Filter requests

$3

`$3`

`bash agent-browser frame # Switch to iframe agent-browser frame main # Back to main frame`

`$3`

`bash agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss`

`$3`

`bash agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page`

`$3`

`bash agent-browser install # Download Chromium browser agent-browser install --with-deps # Also install system deps (Linux)`

`Sessions`

Run multiple isolated browser instances:

`bash

`Different sessions`


agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
Or via environment variable

AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
List active sessions

agent-browser session list
Output:

Active sessions:

-> default

   agent1
Show current session

agent-browser session


Each session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Snapshot Options

The snapshot command supports filtering to reduce output size:

`Options`

`Selectors`

`$3`

Refs provide deterministic element selection from snapshots:

`bash

`1. Get snapshot with refs`


agent-browser snapshot
Output:

- heading "Example Domain" [ref=e1] [level=1]

- button "Submit" [ref=e2]

- textbox "Email" [ref=e3]

- link "Learn more" [ref=e4]
2. Use refs to interact

agent-browser click @e2                   # Click the button
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1                # Get heading text
agent-browser hover @e4                   # Hover the link


Why use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
$3

`bash agent-browser click "#id" agent-browser click ".class" agent-browser click "div > button"`

`$3`

`bash agent-browser click "text=Submit" agent-browser click "xpath=//button"`

`$3`

`bash agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com"`

`Agent Mode`

Use --json for machine-readable output:

`bash agent-browser snapshot --json

`Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}`

agent-browser get text @e1 --json agent-browser is visible @e2 --json`

`$3`

`bash

`1. Navigate and get snapshot`


agent-browser open example.com
agent-browser snapshot -i --json   # AI parses tree and refs
2. AI identifies target refs from snapshot

3. Execute actions using refs

agent-browser click @e2
agent-browser fill @e3 "input text"
4. Get new snapshot if page changed

agent-browser snapshot -i --json


Headed Mode
Show the browser window for debugging:

`bash agent-browser open example.com --headed`

This opens a visible browser window instead of running headless.

`Authenticated Sessions`

Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:

`bash

`Headers are scoped to api.example.com only`


agent-browser open api.example.com --headers '{"Authorization": "Bearer "}'
Requests to api.example.com include the auth header

agent-browser snapshot -i --json
agent-browser click @e2
Navigate to another domain - headers are NOT sent (safe!)

agent-browser open other-site.com


This is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains

To set headers for multiple origins, use --headers with each open command:

`bash agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'`

For global headers (all domains), use set headers:

`bash agent-browser set headers '{"X-Custom-Header": "value"}'`

`Custom Browser Executable`

`$3`

`bash

`Via flag`


agent-browser --executable-path /path/to/chromium open example.com
Via environment variable

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com

$3

`typescript import chromium from '@sparticuz/chromium'; import { BrowserManager } from 'agent-browser';

export async function handler() { const browser = new BrowserManager(); await browser.launch({ executablePath: await chromium.executablePath(), headless: true, }); // ... use browser }`

`CDP Mode`

Connect to an existing browser via Chrome DevTools Protocol:

`bash

`Connect to Electron app`


agent-browser --cdp 9222 snapshot
Connect to Chrome with remote debugging

(Start Chrome with: google-chrome --remote-debugging-port=9222)

agent-browser --cdp 9222 open about:blank


This enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
$3

Set the AGENT_BROWSER_STREAM_PORT environment variable:

`bash AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com`

This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.

`$3`

Connect to ws://localhost:9223 to receive frames and send input:

Receive frames:`json { "type": "frame", "data": "", "metadata": { "deviceWidth": 1280, "deviceHeight": 720, "pageScaleFactor": 1, "offsetTop": 0, "scrollOffsetX": 0, "scrollOffsetY": 0 } }`

Send mouse events:`json { "type": "input_mouse", "eventType": "mousePressed", "x": 100, "y": 200, "button": "left", "clickCount": 1 }`

Send keyboard events:`json { "type": "input_keyboard", "eventType": "keyDown", "key": "Enter", "code": "Enter" }`

Send touch events:`json { "type": "input_touch", "eventType": "touchStart", "touchPoints": [{ "x": 100, "y": 200 }] }`

`$3`

For advanced use, control streaming directly via the protocol:

`typescript import { BrowserManager } from 'agent-browser';

const browser = new BrowserManager(); await browser.launch({ headless: true }); await browser.navigate('https://example.com');

// Inject mouse events await browser.injectMouseEvent({ type: 'mousePressed', x: 100, y: 200, button: 'left', });

// Inject keyboard events await browser.injectKeyboardEvent({ type: 'keyDown', key: 'Enter', code: 'Enter', });

// Stop when done await browser.stopScreencast();`

`Architecture`

agent-browser uses a client-daemon architecture:

The daemon starts automatically on first command and persists between commands for fast subsequent operations.

Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.

`Platforms`

`Usage with AI Agents`

`$3`

The simplest approach - just tell your agent to use it:

`Use agent-browser to test the login flow. Run agent-browser --help to see available commands.`

The --help output is comprehensive and most agents can figure it out from there.

`$3`

For more consistent results, add to your project or global instructions file:

`markdown

`Browser Automation`

Use agent-browser for web automation. Run agent-browser --help for all commands.

`$3`

For Claude Code, a skill provides richer context:

`bash cp -r node_modules/agent-browser/skills/agent-browser .claude/skills/`

Or download:

`bash mkdir -p .claude/skills/agent-browser curl -o .claude/skills/agent-browser/SKILL.md \ https://raw.githubusercontent.com/vercel-labs/agent-browser/main/skills/agent-browser/SKILL.md``

License

Apache-2.0

agent-browser-cdpnetwork

agent-browser

Installation

$3

$3

$3

or manually: npx playwright install-deps chromium

Quick Start

$3

Commands

$3

$3

$3

$3

$3

$3

$3

$3

$3

CDP Network Capture (full traffic with response bodies)

Truncation options (reduce output for LLM agents)

Playwright Interception (for mocking/blocking)

$3

$3

$3

$3

$3

$3

Sessions

Different sessions

Or via environment variable

List active sessions

Output:

Active sessions:

-> default

agent1

Show current session

Snapshot Options

Options

Selectors

$3

1. Get snapshot with refs

Output:

- heading "Example Domain" [ref=e1] [level=1]

- button "Submit" [ref=e2]

- textbox "Email" [ref=e3]

- link "Learn more" [ref=e4]

2. Use refs to interact

$3

$3

$3

Agent Mode

Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

$3

1. Navigate and get snapshot

2. AI identifies target refs from snapshot

3. Execute actions using refs

4. Get new snapshot if page changed

Headed Mode

Authenticated Sessions

Headers are scoped to api.example.com only

Requests to api.example.com include the auth header

Navigate to another domain - headers are NOT sent (safe!)

Custom Browser Executable

$3

Via flag

Via environment variable

$3

CDP Mode

Connect to Electron app

Connect to Chrome with remote debugging

(Start Chrome with: google-chrome --remote-debugging-port=9222)

Streaming (Browser Preview)

$3

$3

$3

Architecture

Platforms

Usage with AI Agents

$3

`$3`

`$3`

`or manually: npx playwright install-deps chromium`

`$3`

`Commands`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`CDP Network Capture (full traffic with response bodies)`

`$3`

`$3`

`$3`

`$3`

`$3`

`Sessions`

`Different sessions`

`Options`

`Selectors`

`$3`

`1. Get snapshot with refs`

`$3`

`$3`

`Agent Mode`

`Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}`

`$3`

`1. Navigate and get snapshot`

`Authenticated Sessions`

`Headers are scoped to api.example.com only`

`Custom Browser Executable`

`$3`

`Via flag`

`CDP Mode`

`Connect to Electron app`

`$3`

`$3`

`Architecture`

`Platforms`

`Usage with AI Agents`

`$3`

`$3`

`Browser Automation`

`$3`

`$3`

`$3`

`or manually: npx playwright install-deps chromium`

`$3`

`Commands`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`CDP Network Capture (full traffic with response bodies)`

`$3`

`$3`

`$3`

`$3`

`$3`

`Sessions`

`Different sessions`

`Options`

`Selectors`

`$3`

`1. Get snapshot with refs`

`$3`

`$3`

`Agent Mode`

`Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}`

`$3`

`1. Navigate and get snapshot`

`Authenticated Sessions`

`Headers are scoped to api.example.com only`