Advanced web crawling, data extraction, and interaction nodes for n8n with LLM capabilities
npm install n8n-nodes-crawl4ai_nafAdvanced web crawling, data extraction, and interaction nodes for n8n with LLM capabilities.
``bash`
npm install n8n-nodes-crawl4ai_naf
- Basic Crawling: Simple web page crawling with markdown/HTML extraction
- CSS Extraction: Extract structured data using CSS selectors
- LLM Extraction: Use LLM for complex data extraction
- Batch Processing: Process multiple URLs concurrently
- Anti-Detection: Undetected browser mode, stealth mode, CAPTCHA bypass
- Element Interaction: Click buttons, fill forms, handle dropdowns
- Authentication: Login form handling and session management
- LLM Prompts: Automate interactions using natural language prompts
- Multi-Step Workflows: Complex interaction sequences
`json`
{
"nodes": [
{
"parameters": {
"operation": "basic_crawl",
"urlConfig": {
"urls": [
{
"url": "https://example.com"
}
]
},
"browserConfig": {
"settings": {
"headless": true,
"viewportWidth": 1920,
"viewportHeight": 1080
}
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
}
]
}
`json`
{
"nodes": [
{
"parameters": {
"operation": "css_extraction",
"urlConfig": {
"urls": [
{
"url": "https://protected.example.com/dashboard"
},
{
"url": "https://protected.example.com/reports"
}
]
},
"browserConfig": {
"settings": {
"headless": true,
"viewportWidth": 1920,
"viewportHeight": 1080
}
},
"antiDetection": {
"settings": {
"undetected": true,
"stealth": true,
"captchaBypass": "2captcha"
}
},
"authConfig": {
"authSettings": {
"enableAuth": true,
"authType": "form",
"username": "your_username",
"password": "your_password",
"loginUrl": "https://protected.example.com/login"
}
},
"advancedConfig": {
"advancedSettings": {
"maxRetries": 3,
"timeout": 30000,
"concurrentRequests": 2,
"debugMode": true
}
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
}
]
}
`json`
{
"nodes": [
{
"parameters": {
"operation": "llm_extraction",
"urlConfig": {
"urls": [
{
"url": "https://complex-data.example.com"
}
]
},
"browserConfig": {
"settings": {
"headless": true
}
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
}
]
}
`json`
{
"nodes": [
{
"parameters": {
"interactionType": "llm_prompt",
"llmPromptConfig": {
"promptSettings": {
"promptText": "Find the login form, fill username with 'testuser' and password with 'testpass', then click the submit button",
"provider": "openai/gpt-4",
"maxTokens": 1000
}
}
},
"name": "Crawl4aiInteraction",
"type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
"typeVersion": 1,
"position": [250, 300]
}
]
}
`json`
{
"nodes": [
{
"parameters": {
"interactionType": "element_click",
"elementConfig": {
"clickSettings": {
"selector": "#submit-button",
"waitAfterClick": 2000
}
}
},
"name": "Crawl4aiInteraction",
"type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
"typeVersion": 1,
"position": [450, 300]
}
]
}
`json`
{
"nodes": [
{
"parameters": {
"operation": "basic_crawl",
"urlConfig": {
"urls": [
{
"url": "https://example.com/login"
}
]
}
},
"name": "Crawl4ai",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"interactionType": "authentication",
"authConfig": {
"authSettings": {
"username": "user@example.com",
"password": "password123",
"loginUrl": "https://example.com/login"
}
}
},
"name": "Crawl4aiInteraction",
"type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
"typeVersion": 1,
"position": [450, 300]
},
{
"parameters": {
"operation": "css_extraction",
"urlConfig": {
"urls": [
{
"url": "https://example.com/dashboard"
}
]
}
},
"name": "Crawl4ai2",
"type": "n8n-nodes-crawl4ai_naf.crawl4ai",
"typeVersion": 1,
"position": [650, 300]
}
],
"connections": {
"Crawl4ai": {
"main": [
[
{
"node": "Crawl4aiInteraction",
"type": "main",
"index": 0
}
]
]
},
"Crawl4aiInteraction": {
"main": [
[
{
"node": "Crawl4ai2",
"type": "main",
"index": 0
}
]
]
}
}
}
- Headless Mode: Run browser in headless mode (default: true)
- Viewport: Set browser viewport dimensions (default: 1920x1080)
- User Agent: Custom user agent string
- Proxy Support: Configure proxy settings
- Undetected Mode: Enable undetected browser mode
- Stealth Mode: Enable stealth mode with fingerprint masking
- CAPTCHA Bypass: Configure CAPTCHA bypass strategies (2Captcha, Anti-Captcha, Custom)
- Behavioral Simulation: Simulate human-like interactions
- Basic Auth: Username/password authentication
- Form Auth: Form-based authentication with login URL
- OAuth2: OAuth2 token-based authentication
- API Key: API key authentication
- Session Cookie: Session cookie authentication
- Max Retries: Maximum number of retry attempts (default: 3)
- Timeout: Request timeout in milliseconds (default: 30000)
- Concurrent Requests: Number of concurrent requests (default: 5)
- Debug Mode: Enable debug logging (default: false)
- Node.js 18+
- npm 9+
- TypeScript 5+
`bash`
npm install
npm run build
`bash`
npm run test
`bash``
npm publish
Both nodes include comprehensive error handling and validation:
- Input data validation
- URL format validation
- Configuration parameter validation
- Authentication credential validation
- Proper error messages and timestamps
For issues, questions, or contributions, please contact: contact@nafer.ru
MIT