A smart AI-driven browser automation library and REST API server using MCP (Model Context Protocol) and LangChain for multi-step task execution. Includes both programmatic library usage and HTTP API server for remote automation.
npm install @ejazullah/smart-browser-automationA powerful AI-driven browser automation library using MCP (Model Context Protocol) and LangChain. This tool can execute complex multi-step browser automation tasks through programmatic library usage.
- AI-Powered: Smart task execution using LangChain and LLM integration
- Multi-step Automation: Execute complex browser workflows
- MCP Integration: Model Context Protocol for advanced browser control
- Multiple LLM Support: HuggingFace, Ollama, and extensible architecture
- Flexible Configuration: Easy setup and customization options
``bash`
npm install @ejazullah/smart-browser-automation
`javascript
import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
// Configuration
const llmConfig = new HuggingFaceConfig("your_huggingface_token");
const mcpEndpoint = 'http://your-mcp-endpoint';
const cdpEndpoint = 'wss://your-cdp-endpoint';
// Create automation instance
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.0
});
try {
// Initialize
await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
// Execute task
const result = await automation.executeTask(
"go to https://duckduckgo.com/ and search for 'AI tools'",
{ verbose: true }
);
console.log("Task completed:", result);
} finally {
// Clean up
await automation.close();
}
`📚 Examples
`javascript
// examples/search-example.js
import { SmartBrowserAutomation, HuggingFaceConfig } from '../index.js';
async function searchExample() {
const automation = new SmartBrowserAutomation({ maxSteps: 10 });
const llmConfig = new HuggingFaceConfig("your_token");
await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
const result = await automation.executeTask(
"go to https://example.com and find the contact information"
);
await automation.close();
}
`
This package includes full TypeScript declarations for better development experience:
`typescript
import {
SmartBrowserAutomation,
HuggingFaceConfig,
type TaskExecutionOptions,
type TaskExecutionResult
} from '@ejazullah/smart-browser-automation';
async function typedExample() {
// Configuration with type checking
const config = new HuggingFaceConfig('your-api-key');
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.1
});
// Options with proper typing
const options: TaskExecutionOptions = {
verbose: true,
onProgress: (update) => {
console.log(Step ${update.step}: ${update.message});Completed ${result.steps} steps, success: ${result.success}
}
};
await automation.initialize(config, mcpEndpoint, cdpEndpoint);
// Result with proper typing
const result: TaskExecutionResult = await automation.executeTask(
"Navigate to Google and search for TypeScript tutorials",
options
);
console.log();`
await automation.close();
}
`javascript
// HuggingFace
const hfConfig = new HuggingFaceConfig("hf_token", {
model: "microsoft/DialoGPT-medium",
temperature: 0.0
});
// Ollama
const ollamaConfig = new OllamaConfig("ollama_endpoint", {
model: "llama2",
temperature: 0.1
});
`
1. Login to npm:
`bash`
npm login
2. Use the publishing script:
`bash`
./publish.sh
3. Or manually:
`bash`
npm version patch # or minor/major
npm publish
- Web Scraping: Automated data extraction from websites
- E2E Testing: End-to-end testing automation
- Form Automation: Automated form filling and submission
- Social Media Management: Automated posting and interactions
- Website Monitoring: Change detection and monitoring
- Data Entry: Bulk data processing and entry tasks
1. Fork the repository
2. Create your feature branch (git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature'
3. Commit your changes ()git push origin feature/amazing-feature
4. Push to the branch ()
5. Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Documentation: Check the examples/ directory
---
Made with ❤️ by Ejaz Ullah
- 🤖 AI-Driven: Uses advanced language models to understand and execute complex browser tasks
- 🔄 Multi-Step Execution: Automatically performs sequences of actions to complete tasks
- 🧠 Smart Decision Making: Analyzes page content and decides next actions intelligently
- 🔌 Multiple LLM Support: Works with Hugging Face, Ollama, OpenAI, and other providers
- 🎯 Task Completion Detection: Knows when a task is fully completed
- 📊 Detailed Logging: Provides comprehensive execution logs and results
`bash`
npm install @ejazullah/smart-browser-automation
`javascript
import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
// Configure your LLM
const llmConfig = new HuggingFaceConfig("your-hugging-face-api-key");
// MCP and WebDriver configuration
const mcpEndpoint = 'http://your-mcp-server:8006/mcp';
const driverUrl = 'wss://your-webdriver-endpoint';
// Create automation instance
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.0
});
// Initialize and execute task
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
const result = await automation.executeTask(
"go to https://example.com and fill out the contact form"
);
console.log(result);
await automation.close();
`
#### Hugging Face
`javascript
import { HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
const config = new HuggingFaceConfig(
"your-api-key",
"Qwen/Qwen3-Coder-480B-A35B-Instruct" // optional model
);
`
#### Ollama
`javascript
import { OllamaConfig } from '@ejazullah/smart-browser-automation';
const config = new OllamaConfig(
"http://localhost:11434", // optional base URL
"llama2" // optional model
);
`
#### OpenAI
`javascript
import { OpenAIConfig } from '@ejazullah/smart-browser-automation';
const config = new OpenAIConfig("your-api-key", "gpt-4");
`
`javascript`
const automation = new SmartBrowserAutomation({
maxSteps: 15, // Maximum steps to execute
temperature: 0.1, // LLM temperature (0.0 = deterministic)
});
#### Constructor
- new SmartBrowserAutomation(config)config.maxSteps
- (number): Maximum execution steps (default: 10)config.temperature
- (number): LLM temperature (default: 0.0)
#### Methods
##### initialize(llmConfig, mcpEndpoint, driverUrl)llmConfig
Initialize the automation system.
- : LLM configuration objectmcpEndpoint
- : MCP server endpoint URLdriverUrl
- : WebDriver WebSocket URL
##### executeTask(taskDescription, options)taskDescription
Execute an automation task.
- (string): Natural language description of the taskoptions.verbose
- (boolean): Enable detailed logging (default: true)options.systemPrompt
- (string): Custom system prompt for the AI
Returns:
`javascript`
{
success: boolean,
steps: number,
results: Array,
completed: boolean
}
##### close()
Clean up and close connections.
javascript
const result = await automation.executeTask(
"go to https://duckduckgo.com/ and search for 'AI tools'"
);
`$3
`javascript
const result = await automation.executeTask(
"navigate to the contact page, fill out the form with name 'John Doe' and email 'john@example.com', then submit it"
);
`$3
`javascript
const result = await automation.executeTask(
"go to the online store, search for 'laptop', filter by price under $1000, and add the first result to cart"
);
`Error Handling
`javascript
try {
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
const result = await automation.executeTask("your task here");
if (!result.success) {
console.error("Task failed:", result);
}
} catch (error) {
console.error("Automation error:", error);
} finally {
await automation.close();
}
``- Node.js 18+
- A running MCP server with browser capabilities
- Access to a WebDriver endpoint
- API key for your chosen LLM provider
MIT
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.
For issues and questions, please visit our GitHub Issues page.