Chrome Browser Agent

Browser automation toolkit for Chrome extensions using CDP (Chrome DevTools Protocol). Powers AI agents that interact with web pages.

Features

- CDP Integration: Full Chrome DevTools Protocol support for reliable browser automation
- Accessibility Tree: Semantic page representation for AI navigation (not CSS selectors)
- Reference-based Targeting: Elements are tracked by refs (ref_1, ref_2) that survive page changes
- Tool Definitions: Ready-to-use tool schemas for LLM tool calling (Claude, GPT, etc.)
- Screenshot Support: High-quality screenshots with DPR scaling

Installation

``bash npm install @hanzili/chrome-browser-agent`

`Setup`

`$3`

`json { "permissions": ["debugger", "scripting", "tabs", "activeTab"], "host_permissions": [""], "content_scripts": [ { "matches": [""], "js": [ "node_modules/@hanzili/chrome-browser-agent/src/content/accessibility-tree.js", "node_modules/@hanzili/chrome-browser-agent/src/content/content.js" ] } ] }`

`$3`

`javascript import { cdpHelper, executeTool, TOOL_DEFINITIONS } from '@hanzili/chrome-browser-agent';

// Pass TOOL_DEFINITIONS to your LLM const response = await callLLM(messages, { tools: TOOL_DEFINITIONS });

// Execute tools returned by LLM for (const toolUse of response.tool_calls) { const result = await executeTool(toolUse.name, toolUse.input, { tabId: currentTabId, sendToContent: (tabId, type, payload) => chrome.tabs.sendMessage(tabId, { type, payload }) }); }`

`Core Concepts`

`$3`

Instead of fragile CSS selectors, this toolkit uses an accessibility tree representation:

`button "Submit Application" [ref_1] textbox "Email" [ref_2] placeholder="Enter email" combobox "Country" [ref_3] option "United States" value="us" option "Canada" value="ca" (selected)`

The LLM sees semantic roles and can reference elements by ref_1, ref_2, etc.

`$3`

| Tool | Description | |------|-------------| |read_page| Get accessibility tree of current page | |computer| Click, type, scroll, screenshot | |form_input| Fill form fields by reference | |navigate| Go to URL, back, forward | |find| Natural language element search | |file_upload | Upload files to inputs |

`API Reference`

`$3`

`javascript // Attach debugger to tab await cdpHelper.attachDebugger(tabId);

// Take screenshot const base64 = await cdpHelper.takeScreenshot(tabId);

// Click at coordinates await cdpHelper.click(tabId, x, y);

// Type text await cdpHelper.type(tabId, "Hello world");`

`$3`

`javascript const result = await executeTool('click', { ref: 'ref_1' }, { tabId, sendToContent, cdpHelper });`

`Content Scripts`

The content scripts must be injected into pages. They provide:

- accessibility-tree.js: Generates the semantic tree, manages element refs -content.js: Handles messages from service worker (form fill, click, etc.) -agent-visual-indicator.js`: Shows visual feedback during automation

License

MIT

Chrome Browser Agent

Browser automation toolkit for Chrome extensions using CDP (Chrome DevTools Protocol). Powers AI agents that interact with web pages.

Features

Installation

``bash npm install @hanzili/chrome-browser-agent`

`Setup`

`$3`

`javascript import { cdpHelper, executeTool, TOOL_DEFINITIONS } from '@hanzili/chrome-browser-agent';

// Pass TOOL_DEFINITIONS to your LLM const response = await callLLM(messages, { tools: TOOL_DEFINITIONS });

`Core Concepts`

`$3`

Instead of fragile CSS selectors, this toolkit uses an accessibility tree representation:

`button "Submit Application" [ref_1] textbox "Email" [ref_2] placeholder="Enter email" combobox "Country" [ref_3] option "United States" value="us" option "Canada" value="ca" (selected)`

The LLM sees semantic roles and can reference elements by ref_1, ref_2, etc.

`$3`

`API Reference`

`$3`

`javascript // Attach debugger to tab await cdpHelper.attachDebugger(tabId);

// Take screenshot const base64 = await cdpHelper.takeScreenshot(tabId);

// Click at coordinates await cdpHelper.click(tabId, x, y);

// Type text await cdpHelper.type(tabId, "Hello world");`

`$3`

`javascript const result = await executeTool('click', { ref: 'ref_1' }, { tabId, sendToContent, cdpHelper });`

`Content Scripts`

The content scripts must be injected into pages. They provide:

License

MIT