MCP Server for Crawl4AI

> Note: Tested with Crawl4AI version 0.7.4

![npm version](https://www.npmjs.com/package/mcp-crawl4ai-ts)
![License: MIT](https://opensource.org/licenses/MIT)
![Node.js CI](https://nodejs.org/)
![coverage](https://omgwtfwow.github.io/mcp-crawl4ai-ts/coverage/)

TypeScript implementation of an MCP server for Crawl4AI. Provides tools for web crawling, content extraction, and browser automation.

- Prerequisites
- Quick Start
- Configuration
- Client-Specific Instructions
- Available Tools
- 1. get_markdown
- 2. capture_screenshot
- 3. generate_pdf
- 4. execute_js
- 5. batch_crawl
- 6. smart_crawl
- 7. get_html
- 8. extract_links
- 9. crawl_recursive
- 10. parse_sitemap
- 11. crawl
- 12. manage_session
- 13. extract_with_llm
- Advanced Configuration
- Development
- License

Prerequisites

- Node.js 18+ and npm
- A running Crawl4AI server

Quick Start

$3

``bash docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.7.4`

`$3`

This MCP server works with any MCP-compatible client (Claude Desktop, Claude Code, Cursor, LMStudio, etc.).

#### Using npx (Recommended)

`json { "mcpServers": { "crawl4ai": { "command": "npx", "args": ["mcp-crawl4ai-ts"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235" } } } }`

#### Using local installation

`json { "mcpServers": { "crawl4ai": { "command": "node", "args": ["/path/to/mcp-crawl4ai-ts/dist/index.js"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235" } } } }`

#### With all optional variables

`json { "mcpServers": { "crawl4ai": { "command": "npx", "args": ["mcp-crawl4ai-ts"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235", "CRAWL4AI_API_KEY": "your-api-key", "SERVER_NAME": "custom-name", "SERVER_VERSION": "1.0.0" } } } }`

`Configuration`

`$3`

`env

`Required`


CRAWL4AI_BASE_URL=http://localhost:11235
Optional - Server Configuration

CRAWL4AI_API_KEY=          # If your server requires auth
SERVER_NAME=crawl4ai-mcp   # Custom name for the MCP server
SERVER_VERSION=1.0.0       # Custom version


Client-Specific Instructions
$3

Add to ~/Library/Application Support/Claude/claude_desktop_config.json

`$3`

`bash claude mcp add crawl4ai -e CRAWL4AI_BASE_URL=http://localhost:11235 -- npx mcp-crawl4ai-ts`

`$3`

Consult your client's documentation for MCP server configuration. The key details:

- Command: npx mcp-crawl4ai-ts or node /path/to/dist/index.js- Required env:CRAWL4AI_BASE_URL- Optional env:CRAWL4AI_API_KEY, SERVER_NAME, SERVER_VERSION

`Available Tools`

`$3`

`typescript { url: string, // Required: URL to extract markdown from filter?: 'raw'|'fit'|'bm25'|'llm', // Filter type (default: 'fit') query?: string, // Query for bm25/llm filters cache?: string // Cache-bust parameter (default: '0') }`

Extracts content as markdown with various filtering options. Use 'bm25' or 'llm' filters with a query for specific content extraction.

`$3`

`typescript { url: string, // Required: URL to capture screenshot_wait_for?: number // Seconds to wait before screenshot (default: 2) }`

Returns base64-encoded PNG. Note: This is stateless - for screenshots after JS execution, use crawl with screenshot: true.

`$3`

`typescript { url: string // Required: URL to convert to PDF }`

Returns base64-encoded PDF. Stateless tool - for PDFs after JS execution, use crawl with pdf: true.

`$3`

`typescript { url: string, // Required: URL to load scripts: string | string[] // Required: JavaScript to execute }`

Executes JavaScript and returns results. Each script can use 'return' to get values back. Stateless - for persistent JS execution use crawl with js_code.

`$3`

`typescript { urls: string[], // Required: List of URLs to crawl max_concurrent?: number, // Parallel request limit (default: 5) remove_images?: boolean, // Remove images from output (default: false) bypass_cache?: boolean, // Bypass cache for all URLs (default: false) configs?: Array<{ // Optional: Per-URL configurations (v3.0.0+) url: string, [key: string]: any // Any crawl parameters for this specific URL }> }`

Efficiently crawls multiple URLs in parallel. Each URL gets a fresh browser instance. With configs array, you can specify different parameters for each URL.

`$3`

`typescript { url: string, // Required: URL to crawl max_depth?: number, // Maximum depth for recursive crawling (default: 2) follow_links?: boolean, // Follow links in content (default: true) bypass_cache?: boolean // Bypass cache (default: false) }`

Intelligently detects content type (HTML/sitemap/RSS) and processes accordingly.

`$3`

`typescript { url: string // Required: URL to extract HTML from }`

Returns preprocessed HTML optimized for structure analysis. Use for building schemas or analyzing patterns.

`$3`

`typescript { url: string, // Required: URL to extract links from categorize?: boolean // Group by type (default: true) }`

Extracts all links and groups them by type: internal, external, social media, documents, images.

`$3`

`typescript { url: string, // Required: Starting URL max_depth?: number, // Maximum depth to crawl (default: 3) max_pages?: number, // Maximum pages to crawl (default: 50) include_pattern?: string, // Regex pattern for URLs to include exclude_pattern?: string // Regex pattern for URLs to exclude }`

Crawls a website following internal links up to specified depth. Returns content from all discovered pages.

`$3`

`typescript { url: string, // Required: Sitemap URL (e.g., /sitemap.xml) filter_pattern?: string // Optional: Regex pattern to filter URLs }`

Extracts all URLs from XML sitemaps. Supports regex filtering for specific URL patterns.

`$3`

`typescript { url: string, // URL to crawl // Browser Configuration browser_type?: 'chromium'|'firefox'|'webkit'|'undetected', // Browser engine (undetected = stealth mode) viewport_width?: number, // Browser width (default: 1080) viewport_height?: number, // Browser height (default: 600) user_agent?: string, // Custom user agent proxy_server?: string | { // Proxy URL (string or object format) server: string, username?: string, password?: string }, proxy_username?: string, // Proxy auth (if using string format) proxy_password?: string, // Proxy password (if using string format) cookies?: Array<{name, value, domain}>, // Pre-set cookies headers?: Record, // Custom headers // Crawler Configuration word_count_threshold?: number, // Min words per block (default: 200) excluded_tags?: string[], // HTML tags to exclude remove_overlay_elements?: boolean, // Remove popups/modals js_code?: string | string[], // JavaScript to execute wait_for?: string, // Wait condition (selector or JS) wait_for_timeout?: number, // Wait timeout (default: 30000) delay_before_scroll?: number, // Pre-scroll delay scroll_delay?: number, // Between-scroll delay process_iframes?: boolean, // Include iframe content exclude_external_links?: boolean, // Remove external links screenshot?: boolean, // Capture screenshot pdf?: boolean, // Generate PDF session_id?: string, // Reuse browser session (only works with crawl tool) cache_mode?: 'ENABLED'|'BYPASS'|'DISABLED', // Cache control // New in v3.0.0 (Crawl4AI 0.7.3/0.7.4) css_selector?: string, // CSS selector to filter content delay_before_return_html?: number, // Delay in seconds before returning HTML include_links?: boolean, // Include extracted links in response resolve_absolute_urls?: boolean, // Convert relative URLs to absolute // LLM Extraction (REST API only supports 'llm' type) extraction_type?: 'llm', // Only 'llm' extraction is supported via REST API extraction_schema?: object, // Schema for structured extraction extraction_instruction?: string, // Natural language extraction prompt extraction_strategy?: { // Advanced extraction configuration provider?: string, api_key?: string, model?: string, [key: string]: any }, table_extraction_strategy?: { // Table extraction configuration enable_chunking?: boolean, thresholds?: object, [key: string]: any }, markdown_generator_options?: { // Markdown generation options include_links?: boolean, preserve_formatting?: boolean, [key: string]: any }, timeout?: number, // Overall timeout (default: 60000) verbose?: boolean // Detailed logging }`

`$3`

`typescript { action: 'create' | 'clear' | 'list', // Required: Action to perform session_id?: string, // For 'create' and 'clear' actions initial_url?: string, // For 'create' action: URL to load browser_type?: 'chromium' | 'firefox' | 'webkit' | 'undetected' // For 'create' action }`

Unified tool for managing browser sessions. Supports three actions: - create: Start a persistent browser session - clear: Remove a session from local tracking - list: Show all active sessions

Examples:`typescript // Create a new session { action: 'create', session_id: 'my-session', initial_url: 'https://example.com' }

// Clear a session { action: 'clear', session_id: 'my-session' }

// List all sessions { action: 'list' }`

`$3`

`typescript { url: string, // URL to extract data from query: string // Natural language extraction instructions }`

Uses AI to extract structured data from webpages. Returns results immediately without any polling or job management. This is the recommended way to extract specific information since CSS/XPath extraction is not supported via the REST API.

`Advanced Configuration`

For detailed information about all available configuration options, extraction strategies, and advanced features, please refer to the official Crawl4AI documentation:

- Crawl4AI Documentation - Crawl4AI GitHub Repository

`Changelog`

See CHANGELOG.md for detailed version history and recent updates.

`Development`

`$3`

`bash

`1. Start the Crawl4AI server`


docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:latest
2. Install MCP server

git clone https://github.com/omgwtfwow/mcp-crawl4ai-ts.git
cd mcp-crawl4ai-ts
npm install
cp .env.example .env
3. Development commands

npm run dev    # Development mode
npm test       # Run tests
npm run lint   # Check code quality
npm run build  # Production build
4. Add to your MCP client (See "Using local installation")


$3
Integration tests require a running Crawl4AI server. Configure your environment:

`bash

`Required for integration tests`


export CRAWL4AI_BASE_URL=http://localhost:11235
export CRAWL4AI_API_KEY=your-api-key  # If authentication is required
Optional: For LLM extraction tests

export LLM_PROVIDER=openai/gpt-4o-mini
export LLM_API_TOKEN=your-llm-api-key
export LLM_BASE_URL=https://api.openai.com/v1  # If using custom endpoint

`Run integration tests (ALWAYS use the npm script; don't call` jest `directly)`


npm run test:integration
Run a single integration test file

npm run test:integration -- src/__tests__/integration/extract-links.integration.test.ts

> IMPORTANT: Do NOT run npx jest directly for integration tests. The npm script injects NODE_OPTIONS=--experimental-vm-modules which is required for ESM + ts-jest. Running Jest directly will produce SyntaxError: Cannot use import statement outside a moduleand hang.`

Integration tests cover:

- Dynamic content and JavaScript execution - Session management and cookies - Content extraction (LLM-based only) - Media handling (screenshots, PDFs) - Performance and caching - Content filtering - Bot detection avoidance - Error handling

`$3`


1. Docker container healthy:

bash
  docker ps --filter name=crawl4ai --format '{{.Names}} {{.Status}}'
  curl -sf http://localhost:11235/health || echo "Health check failed"


2. Env vars loaded (either exported or in

.env): CRAWL4AI_BASE_URL (required), optional: CRAWL4AI_API_KEY, LLM_PROVIDER, LLM_API_TOKEN, LLM_BASE_URL

.
3. Use

npm run test:integration (never raw jest

).
4. To target one file add it after

 (see example above).
5. Expect total runtime ~2–3 minutes; longer or immediate hang usually means missing

NODE_OPTIONS

 or wrong Jest version.
$3

| Symptom | Likely Cause | Fix |
|---------|--------------|-----|
|

SyntaxError: Cannot use import statement outside a module | Ran jest directly without script flags | Re-run with npm run test:integration

 |
| Hangs on first test (RUNS ...) | Missing experimental VM modules flag | Use npm script / ensure

NODE_OPTIONS=--experimental-vm-modules

 |
| Network timeouts | Crawl4AI container not healthy / DNS blocked | Restart container:

docker restart

 |
| LLM tests skipped | Missing

LLM_PROVIDER or LLM_API_TOKEN

 | Export required LLM vars |
| New Jest major upgrade breaks tests | Version mismatch with

ts-jest | Keep Jest 29.x unless ts-jest

 upgraded accordingly |
$3

Current stack:

jest@29.x + ts-jest@29.x + ESM ("type": "module"). Updating Jest to 30+ requires upgrading ts-jest and revisiting jest.config.cjs`. Keep versions aligned to avoid parse errors.

License

MIT

MCP Server for Crawl4AI

> Note: Tested with Crawl4AI version 0.7.4

TypeScript implementation of an MCP server for Crawl4AI. Provides tools for web crawling, content extraction, and browser automation.

Prerequisites

- Node.js 18+ and npm
- A running Crawl4AI server

Quick Start

$3

``bash docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.7.4`

`$3`

This MCP server works with any MCP-compatible client (Claude Desktop, Claude Code, Cursor, LMStudio, etc.).

#### Using npx (Recommended)

`json { "mcpServers": { "crawl4ai": { "command": "npx", "args": ["mcp-crawl4ai-ts"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235" } } } }`

#### Using local installation

`json { "mcpServers": { "crawl4ai": { "command": "node", "args": ["/path/to/mcp-crawl4ai-ts/dist/index.js"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235" } } } }`

#### With all optional variables

`Configuration`

`$3`

`env

`Required`


CRAWL4AI_BASE_URL=http://localhost:11235
Optional - Server Configuration

CRAWL4AI_API_KEY=          # If your server requires auth
SERVER_NAME=crawl4ai-mcp   # Custom name for the MCP server
SERVER_VERSION=1.0.0       # Custom version


Client-Specific Instructions
$3

Add to ~/Library/Application Support/Claude/claude_desktop_config.json

`$3`

`bash claude mcp add crawl4ai -e CRAWL4AI_BASE_URL=http://localhost:11235 -- npx mcp-crawl4ai-ts`

`$3`

Consult your client's documentation for MCP server configuration. The key details:

- Command: npx mcp-crawl4ai-ts or node /path/to/dist/index.js- Required env:CRAWL4AI_BASE_URL- Optional env:CRAWL4AI_API_KEY, SERVER_NAME, SERVER_VERSION

`Available Tools`

`$3`

Extracts content as markdown with various filtering options. Use 'bm25' or 'llm' filters with a query for specific content extraction.

`$3`

`typescript { url: string, // Required: URL to capture screenshot_wait_for?: number // Seconds to wait before screenshot (default: 2) }`

Returns base64-encoded PNG. Note: This is stateless - for screenshots after JS execution, use crawl with screenshot: true.

`$3`

`typescript { url: string // Required: URL to convert to PDF }`

Returns base64-encoded PDF. Stateless tool - for PDFs after JS execution, use crawl with pdf: true.

`$3`

`typescript { url: string, // Required: URL to load scripts: string | string[] // Required: JavaScript to execute }`

Executes JavaScript and returns results. Each script can use 'return' to get values back. Stateless - for persistent JS execution use crawl with js_code.

`$3`

Efficiently crawls multiple URLs in parallel. Each URL gets a fresh browser instance. With configs array, you can specify different parameters for each URL.

`$3`

Intelligently detects content type (HTML/sitemap/RSS) and processes accordingly.

`$3`

`typescript { url: string // Required: URL to extract HTML from }`

Returns preprocessed HTML optimized for structure analysis. Use for building schemas or analyzing patterns.

`$3`

`typescript { url: string, // Required: URL to extract links from categorize?: boolean // Group by type (default: true) }`

Extracts all links and groups them by type: internal, external, social media, documents, images.

`$3`

Crawls a website following internal links up to specified depth. Returns content from all discovered pages.

`$3`

`typescript { url: string, // Required: Sitemap URL (e.g., /sitemap.xml) filter_pattern?: string // Optional: Regex pattern to filter URLs }`

Extracts all URLs from XML sitemaps. Supports regex filtering for specific URL patterns.

`$3`

Unified tool for managing browser sessions. Supports three actions: - create: Start a persistent browser session - clear: Remove a session from local tracking - list: Show all active sessions

Examples:`typescript // Create a new session { action: 'create', session_id: 'my-session', initial_url: 'https://example.com' }

// Clear a session { action: 'clear', session_id: 'my-session' }

// List all sessions { action: 'list' }`

`$3`

`typescript { url: string, // URL to extract data from query: string // Natural language extraction instructions }`

`Advanced Configuration`

For detailed information about all available configuration options, extraction strategies, and advanced features, please refer to the official Crawl4AI documentation:

- Crawl4AI Documentation - Crawl4AI GitHub Repository

`Changelog`

See CHANGELOG.md for detailed version history and recent updates.

`Development`

`$3`

`bash

`1. Start the Crawl4AI server`


docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:latest
2. Install MCP server

git clone https://github.com/omgwtfwow/mcp-crawl4ai-ts.git
cd mcp-crawl4ai-ts
npm install
cp .env.example .env
3. Development commands

npm run dev    # Development mode
npm test       # Run tests
npm run lint   # Check code quality
npm run build  # Production build
4. Add to your MCP client (See "Using local installation")


$3
Integration tests require a running Crawl4AI server. Configure your environment:

`bash

`Required for integration tests`


export CRAWL4AI_BASE_URL=http://localhost:11235
export CRAWL4AI_API_KEY=your-api-key  # If authentication is required
Optional: For LLM extraction tests

export LLM_PROVIDER=openai/gpt-4o-mini
export LLM_API_TOKEN=your-llm-api-key
export LLM_BASE_URL=https://api.openai.com/v1  # If using custom endpoint

`Run integration tests (ALWAYS use the npm script; don't call` jest `directly)`


npm run test:integration
Run a single integration test file

npm run test:integration -- src/__tests__/integration/extract-links.integration.test.ts

Integration tests cover:

`$3`


1. Docker container healthy:

bash
  docker ps --filter name=crawl4ai --format '{{.Names}} {{.Status}}'
  curl -sf http://localhost:11235/health || echo "Health check failed"


2. Env vars loaded (either exported or in

.env): CRAWL4AI_BASE_URL (required), optional: CRAWL4AI_API_KEY, LLM_PROVIDER, LLM_API_TOKEN, LLM_BASE_URL

.
3. Use

npm run test:integration (never raw jest

).
4. To target one file add it after

 (see example above).
5. Expect total runtime ~2–3 minutes; longer or immediate hang usually means missing

NODE_OPTIONS

 or wrong Jest version.
$3

| Symptom | Likely Cause | Fix |
|---------|--------------|-----|
|

SyntaxError: Cannot use import statement outside a module | Ran jest directly without script flags | Re-run with npm run test:integration

 |
| Hangs on first test (RUNS ...) | Missing experimental VM modules flag | Use npm script / ensure

NODE_OPTIONS=--experimental-vm-modules

 |
| Network timeouts | Crawl4AI container not healthy / DNS blocked | Restart container:

docker restart

 |
| LLM tests skipped | Missing

LLM_PROVIDER or LLM_API_TOKEN

 | Export required LLM vars |
| New Jest major upgrade breaks tests | Version mismatch with

ts-jest | Keep Jest 29.x unless ts-jest

 upgraded accordingly |
$3

Current stack:

jest@29.x + ts-jest@29.x + ESM ("type": "module"). Updating Jest to 30+ requires upgrading ts-jest and revisiting jest.config.cjs`. Keep versions aligned to avoid parse errors.

License

MIT

mcp-crawl4ai-ts

MCP Server for Crawl4AI

Table of Contents

Prerequisites

Quick Start

$3

$3

Configuration

$3

Required

Optional - Server Configuration

Client-Specific Instructions

$3

$3

$3

Available Tools

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

Advanced Configuration

Changelog

Development

$3

1. Start the Crawl4AI server

2. Install MCP server

3. Development commands

4. Add to your MCP client (See "Using local installation")

$3

Required for integration tests

Optional: For LLM extraction tests

Run integration tests (ALWAYS use the npm script; don't call jest directly)

Run a single integration test file

$3

$3

$3

License

mcp-crawl4ai-ts

MCP Server for Crawl4AI

Table of Contents

Prerequisites

Quick Start

$3

$3

Configuration

$3

Required

Optional - Server Configuration

Client-Specific Instructions

$3

$3

$3

Available Tools

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

Advanced Configuration

Changelog

Development

$3

1. Start the Crawl4AI server

`$3`

`Configuration`

`$3`

`Required`

`$3`

`$3`

`Available Tools`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`Advanced Configuration`

`Changelog`

`Development`

`$3`

`1. Start the Crawl4AI server`

`Required for integration tests`

`Run integration tests (ALWAYS use the npm script; don't call` jest `directly)`

`$3`

`$3`

`Configuration`

`$3`

`Required`

`$3`

`$3`

`Available Tools`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`Advanced Configuration`

`Changelog`

`Development`

`$3`

`1. Start the Crawl4AI server`

`Required for integration tests`

`Run integration tests (ALWAYS use the npm script; don't call` jest `directly)`

`$3`