automation-core

Browser automation library for Chrome Extensions - extracted from Nanobrowser.

> Standalone Package: This is an extracted automation core that can be used as a dependency in other Chrome extensions.

Overview

This library provides AI-driven browser automation capabilities specifically designed for Chrome Extensions. It uses LLM (Large Language Model) agents to interpret natural language commands and execute browser actions.

Requirements

- Chrome Extension Manifest V3
- Required Permissions:
- debugger - For CDP (Chrome DevTools Protocol) access
- tabs - For tab management
- scripting - For DOM injection
- activeTab - For current tab access
- Host Permissions for target sites

⚠️ Important: This library will NOT work in:
- Node.js scripts
- Web pages
- Firefox, Safari, or other browsers

Installation

``bash

`From npm`


npm install @riruru/automation-core puppeteer-core zod
Or with pnpm

pnpm add @riruru/automation-core puppeteer-core zod


Quick Start

`typescript import { AutomationAgent, BrowserContext } from '@riruru/automation-core';

// Create a browser context attached to the active tab const context = await BrowserContext.fromActiveTab();

// Create an automation agent const agent = new AutomationAgent({ context, llm: { provider: 'anthropic', apiKey: 'sk-ant-...', model: 'claude-sonnet-4-20250514' } });

// Subscribe to events (optional) agent.on('step', (event) => { console.log(Step ${event.step}: ${event.details}); });

// Execute a task const result = await agent.execute("Click the Jobs button in the navigation");

console.log(result); // { // success: true, // steps: [...], // finalUrl: "https://example.com/jobs", // finalAnswer: "Clicked the Jobs button" // }`

`API Reference`

`$3`

The main entry point for browser automation.

`typescript const agent = new AutomationAgent({ context: BrowserContext, // Optional - creates from active tab if not provided llm: { provider: 'anthropic' | 'openai' | 'gemini' | 'ollama', apiKey: string, model: string, baseUrl?: string, // For custom endpoints temperature?: number, // Default: 0.1 }, options?: { maxSteps?: number, // Default: 50 maxActionsPerStep?: number, // Default: 5 maxFailures?: number, // Default: 3 useVision?: boolean, // Default: false } });

// Execute a task const result = await agent.execute("Your task description");

// Subscribe to events agent.on('step' | 'action' | 'error' | 'complete' | 'all', handler);

// Stop execution await agent.stop();

// Cleanup resources await agent.cleanup();`

`$3`

Manages browser tabs and pages.

`typescript // Create from active tab const context = await BrowserContext.fromActiveTab();

// Create from specific tab const context = await BrowserContext.fromTab(tabId);

// Get current page const page = await context.getCurrentPage();

// Navigate await context.navigateTo('https://example.com');

// Tab management await context.openTab('https://example.com'); await context.switchTab(tabId); await context.closeTab(tabId);`

`$3`

The result of executing a task.

`typescript interface TaskResult { success: boolean; error?: string; steps: StepRecord[]; finalUrl: string; finalAnswer?: string; data?: unknown; // For extraction tasks }`

`Supported LLM Providers`

| Provider | Models | |----------|--------| | Anthropic | Claude 3 Opus, Sonnet, Haiku, Claude 3.5 Sonnet | | OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 Turbo | | Google | Gemini Pro, Gemini Ultra | | Ollama | Any local model |

`Available Actions`

The agent can perform these browser actions:

- Navigation: go_to_url, go_back, search_google- Interaction:click_element, input_text, send_keys- Scrolling:scroll_to_top, scroll_to_bottom, next_page, previous_page- Tab Management:open_tab, switch_tab, close_tab- Dropdowns:get_dropdown_options, select_dropdown_option- Utility:wait, cache_content, done

`Manifest Configuration`

Add these permissions to your Chrome Extension manifest:

`json { "manifest_version": 3, "permissions": [ "debugger", "tabs", "scripting", "activeTab" ], "host_permissions": [ "" ] }`

`How It Works`

1. User provides a natural language task (e.g., "Click the Jobs button") 2. Navigator Agent analyzes the page - extracts interactive elements 3. LLM decides what action to take - based on the task and page state 4. Action is executed via Puppeteer/CDP 5. Loop continues until task is complete or max steps reached

`Architecture`

`AutomationAgent └── Executor ├── NavigatorAgent (LLM-driven decision making) │ ├── Prompts (system instructions) │ ├── Actions (click, input, scroll, etc.) │ └── MessageManager (conversation history) └── BrowserContext ├── Page (Puppeteer wrapper) └── DOM Services (element extraction)`

`Development`

`$3`

`bash pnpm install pnpm build`

`$3`

`bash pnpm type-check`

`$3`

`bash pnpm test # Watch mode pnpm test:run # Single run pnpm test:coverage # With coverage report`

See TESTING.md for comprehensive testing documentation.

`Project Structure`

`automation-core/ ├── agent/ # AI agent logic │ ├── actions/ # Browser actions (click, input, etc.) │ ├── messages/ # LLM conversation management │ └── prompts/ # System prompts for agents ├── browser/ # Browser control layer │ └── dom/ # DOM extraction and manipulation ├── llm/ # LLM factory and config ├── utils/ # Utilities (logger, JSON repair, etc.) ├── test/ # Test setup and utilities ├── types.ts # Shared type definitions ├── index.ts # Main entry point └── automation-agent.ts # High-level agent wrapper``

License

Apache-2.0

automation-core

Browser automation library for Chrome Extensions - extracted from Nanobrowser.

> Standalone Package: This is an extracted automation core that can be used as a dependency in other Chrome extensions.

Overview

Requirements

⚠️ Important: This library will NOT work in:
- Node.js scripts
- Web pages
- Firefox, Safari, or other browsers

Installation

``bash

`From npm`


npm install @riruru/automation-core puppeteer-core zod
Or with pnpm

pnpm add @riruru/automation-core puppeteer-core zod


Quick Start

`typescript import { AutomationAgent, BrowserContext } from '@riruru/automation-core';

// Create a browser context attached to the active tab const context = await BrowserContext.fromActiveTab();

// Create an automation agent const agent = new AutomationAgent({ context, llm: { provider: 'anthropic', apiKey: 'sk-ant-...', model: 'claude-sonnet-4-20250514' } });

// Subscribe to events (optional) agent.on('step', (event) => { console.log(Step ${event.step}: ${event.details}); });

// Execute a task const result = await agent.execute("Click the Jobs button in the navigation");

console.log(result); // { // success: true, // steps: [...], // finalUrl: "https://example.com/jobs", // finalAnswer: "Clicked the Jobs button" // }`

`API Reference`

`$3`

The main entry point for browser automation.

// Execute a task const result = await agent.execute("Your task description");

// Subscribe to events agent.on('step' | 'action' | 'error' | 'complete' | 'all', handler);

// Stop execution await agent.stop();

// Cleanup resources await agent.cleanup();`

`$3`

Manages browser tabs and pages.

`typescript // Create from active tab const context = await BrowserContext.fromActiveTab();

// Create from specific tab const context = await BrowserContext.fromTab(tabId);

// Get current page const page = await context.getCurrentPage();

// Navigate await context.navigateTo('https://example.com');

// Tab management await context.openTab('https://example.com'); await context.switchTab(tabId); await context.closeTab(tabId);`

`$3`

The result of executing a task.

`typescript interface TaskResult { success: boolean; error?: string; steps: StepRecord[]; finalUrl: string; finalAnswer?: string; data?: unknown; // For extraction tasks }`

`Supported LLM Providers`

`Available Actions`

The agent can perform these browser actions:

`Manifest Configuration`

Add these permissions to your Chrome Extension manifest:

`json { "manifest_version": 3, "permissions": [ "debugger", "tabs", "scripting", "activeTab" ], "host_permissions": [ "" ] }`

`How It Works`

`Architecture`

`Development`

`$3`

`bash pnpm install pnpm build`

`$3`

`bash pnpm type-check`

`$3`

`bash pnpm test # Watch mode pnpm test:run # Single run pnpm test:coverage # With coverage report`

See TESTING.md for comprehensive testing documentation.

`Project Structure`

License

Apache-2.0