Browser automation library for Chrome Extensions - LLM-powered browser agent
npm install @riruru/automation-coreBrowser automation library for Chrome Extensions - extracted from Nanobrowser.
> Standalone Package: This is an extracted automation core that can be used as a dependency in other Chrome extensions.
This library provides AI-driven browser automation capabilities specifically designed for Chrome Extensions. It uses LLM (Large Language Model) agents to interpret natural language commands and execute browser actions.
- Chrome Extension Manifest V3
- Required Permissions:
- debugger - For CDP (Chrome DevTools Protocol) access
- tabs - For tab management
- scripting - For DOM injection
- activeTab - For current tab access
- Host Permissions for target sites
⚠️ Important: This library will NOT work in:
- Node.js scripts
- Web pages
- Firefox, Safari, or other browsers
``bashFrom npm
npm install @riruru/automation-core puppeteer-core zod
Quick Start
`typescript
import { AutomationAgent, BrowserContext } from '@riruru/automation-core';// Create a browser context attached to the active tab
const context = await BrowserContext.fromActiveTab();
// Create an automation agent
const agent = new AutomationAgent({
context,
llm: {
provider: 'anthropic',
apiKey: 'sk-ant-...',
model: 'claude-sonnet-4-20250514'
}
});
// Subscribe to events (optional)
agent.on('step', (event) => {
console.log(
Step ${event.step}: ${event.details});
});// Execute a task
const result = await agent.execute("Click the Jobs button in the navigation");
console.log(result);
// {
// success: true,
// steps: [...],
// finalUrl: "https://example.com/jobs",
// finalAnswer: "Clicked the Jobs button"
// }
`API Reference
$3
The main entry point for browser automation.
`typescript
const agent = new AutomationAgent({
context: BrowserContext, // Optional - creates from active tab if not provided
llm: {
provider: 'anthropic' | 'openai' | 'gemini' | 'ollama',
apiKey: string,
model: string,
baseUrl?: string, // For custom endpoints
temperature?: number, // Default: 0.1
},
options?: {
maxSteps?: number, // Default: 50
maxActionsPerStep?: number, // Default: 5
maxFailures?: number, // Default: 3
useVision?: boolean, // Default: false
}
});// Execute a task
const result = await agent.execute("Your task description");
// Subscribe to events
agent.on('step' | 'action' | 'error' | 'complete' | 'all', handler);
// Stop execution
await agent.stop();
// Cleanup resources
await agent.cleanup();
`$3
Manages browser tabs and pages.
`typescript
// Create from active tab
const context = await BrowserContext.fromActiveTab();// Create from specific tab
const context = await BrowserContext.fromTab(tabId);
// Get current page
const page = await context.getCurrentPage();
// Navigate
await context.navigateTo('https://example.com');
// Tab management
await context.openTab('https://example.com');
await context.switchTab(tabId);
await context.closeTab(tabId);
`$3
The result of executing a task.
`typescript
interface TaskResult {
success: boolean;
error?: string;
steps: StepRecord[];
finalUrl: string;
finalAnswer?: string;
data?: unknown; // For extraction tasks
}
`Supported LLM Providers
| Provider | Models |
|----------|--------|
| Anthropic | Claude 3 Opus, Sonnet, Haiku, Claude 3.5 Sonnet |
| OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 Turbo |
| Google | Gemini Pro, Gemini Ultra |
| Ollama | Any local model |
Available Actions
The agent can perform these browser actions:
- Navigation:
go_to_url, go_back, search_google
- Interaction: click_element, input_text, send_keys
- Scrolling: scroll_to_top, scroll_to_bottom, next_page, previous_page
- Tab Management: open_tab, switch_tab, close_tab
- Dropdowns: get_dropdown_options, select_dropdown_option
- Utility: wait, cache_content, doneManifest Configuration
Add these permissions to your Chrome Extension manifest:
`json
{
"manifest_version": 3,
"permissions": [
"debugger",
"tabs",
"scripting",
"activeTab"
],
"host_permissions": [
""
]
}
`How It Works
1. User provides a natural language task (e.g., "Click the Jobs button")
2. Navigator Agent analyzes the page - extracts interactive elements
3. LLM decides what action to take - based on the task and page state
4. Action is executed via Puppeteer/CDP
5. Loop continues until task is complete or max steps reached
Architecture
`
AutomationAgent
└── Executor
├── NavigatorAgent (LLM-driven decision making)
│ ├── Prompts (system instructions)
│ ├── Actions (click, input, scroll, etc.)
│ └── MessageManager (conversation history)
└── BrowserContext
├── Page (Puppeteer wrapper)
└── DOM Services (element extraction)
`Development
$3
`bash
pnpm install
pnpm build
`$3
`bash
pnpm type-check
`$3
`bash
pnpm test # Watch mode
pnpm test:run # Single run
pnpm test:coverage # With coverage report
`See TESTING.md for comprehensive testing documentation.
Project Structure
`
automation-core/
├── agent/ # AI agent logic
│ ├── actions/ # Browser actions (click, input, etc.)
│ ├── messages/ # LLM conversation management
│ └── prompts/ # System prompts for agents
├── browser/ # Browser control layer
│ └── dom/ # DOM extraction and manipulation
├── llm/ # LLM factory and config
├── utils/ # Utilities (logger, JSON repair, etc.)
├── test/ # Test setup and utilities
├── types.ts # Shared type definitions
├── index.ts # Main entry point
└── automation-agent.ts # High-level agent wrapper
``Apache-2.0