Browser automation toolkit for Chrome extensions using CDP (Chrome DevTools Protocol). Powers AI agents that interact with web pages.
npm install @hanzili/chrome-browser-agentBrowser automation toolkit for Chrome extensions using CDP (Chrome DevTools Protocol). Powers AI agents that interact with web pages.
- CDP Integration: Full Chrome DevTools Protocol support for reliable browser automation
- Accessibility Tree: Semantic page representation for AI navigation (not CSS selectors)
- Reference-based Targeting: Elements are tracked by refs (ref_1, ref_2) that survive page changes
- Tool Definitions: Ready-to-use tool schemas for LLM tool calling (Claude, GPT, etc.)
- Screenshot Support: High-quality screenshots with DPR scaling
``bash`
npm install @hanzili/chrome-browser-agent
`json`
{
"permissions": ["debugger", "scripting", "tabs", "activeTab"],
"host_permissions": ["
"content_scripts": [
{
"matches": ["
"js": [
"node_modules/@hanzili/chrome-browser-agent/src/content/accessibility-tree.js",
"node_modules/@hanzili/chrome-browser-agent/src/content/content.js"
]
}
]
}
`javascript
import {
cdpHelper,
executeTool,
TOOL_DEFINITIONS
} from '@hanzili/chrome-browser-agent';
// Pass TOOL_DEFINITIONS to your LLM
const response = await callLLM(messages, { tools: TOOL_DEFINITIONS });
// Execute tools returned by LLM
for (const toolUse of response.tool_calls) {
const result = await executeTool(toolUse.name, toolUse.input, {
tabId: currentTabId,
sendToContent: (tabId, type, payload) => chrome.tabs.sendMessage(tabId, { type, payload })
});
}
`
Instead of fragile CSS selectors, this toolkit uses an accessibility tree representation:
``
button "Submit Application" [ref_1]
textbox "Email" [ref_2] placeholder="Enter email"
combobox "Country" [ref_3]
option "United States" value="us"
option "Canada" value="ca" (selected)
The LLM sees semantic roles and can reference elements by ref_1, ref_2, etc.
| Tool | Description |
|------|-------------|
| read_page | Get accessibility tree of current page |computer
| | Click, type, scroll, screenshot |form_input
| | Fill form fields by reference |navigate
| | Go to URL, back, forward |find
| | Natural language element search |file_upload
| | Upload files to inputs |
`javascript
// Attach debugger to tab
await cdpHelper.attachDebugger(tabId);
// Take screenshot
const base64 = await cdpHelper.takeScreenshot(tabId);
// Click at coordinates
await cdpHelper.click(tabId, x, y);
// Type text
await cdpHelper.type(tabId, "Hello world");
`
`javascript`
const result = await executeTool('click', { ref: 'ref_1' }, {
tabId,
sendToContent,
cdpHelper
});
The content scripts must be injected into pages. They provide:
- accessibility-tree.js: Generates the semantic tree, manages element refscontent.js
- : Handles messages from service worker (form fill, click, etc.)agent-visual-indicator.js`: Shows visual feedback during automation
-
MIT