Deepcrawl SDK

TypeScript SDK for the Deepcrawl API - Web scraping and crawling with comprehensive error handling.

![npm version](https://www.npmjs.com/package/deepcrawl)
![TypeScript](https://www.typescriptlang.org)
![MIT License](https://opensource.org/licenses/MIT)

⚡ Why Deepcrawl SDK?

- 🏗️ oRPC-Powered: Built on oRPC framework for type-safe RPC
- 🔒 Type-Safe: End-to-end TypeScript with error handling
- 🖥️ Server-Side Only: Designed for Node.js, Cloudflare Workers, and Next.js Server Actions
- 🪶 Lightweight: Minimal bundle size with tree-shaking support
- 🛡️ Error Handling: Comprehensive, typed errors with context
- 🔄 Retry Logic: Built-in exponential backoff for transient failures
- ⚡ Connection Pooling: Automatic HTTP connection reuse (Node.js)

📦 Installation

``bash npm install deepcrawl

`or`


yarn add deepcrawl
or

pnpm add deepcrawl


> Zod v4 ships with the SDK as a runtime dependency and is mirrored as a peer dependency.
> If your app already provides Zod ≥4.1, your package manager will dedupe it; otherwise,
> the bundled copy means no extra install step.
🚀 Quick Start

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.readUrl('https://example.com'); console.log(result.markdown);`

`📦 Package Exports`

The SDK uses dedicated export paths for better tree-shaking and organization:

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl';`

`$3`

`typescript import type { // Configuration DeepcrawlConfig,

// API Types ReadUrlOptions, ReadUrlResponse, GetMarkdownOptions, GetMarkdownResponse, ExtractLinksOptions, ExtractLinksResponse, GetLinksOptions, GetLinksResponse,

// Activity Logs ActivityLogEntry, ListLogsOptions, ListLogsResponse, GetOneLogOptions,

// Metadata & Metrics Metadata, MetricsOptions, Metrics,

// Links LinksTree, LinkItem, SocialMediaLink,

// Errors DeepcrawlError, DeepcrawlReadError, DeepcrawlLinksError, DeepcrawlRateLimitError, DeepcrawlAuthError, DeepcrawlValidationError, DeepcrawlNotFoundError, DeepcrawlServerError, DeepcrawlNetworkError, } from 'deepcrawl/types';`

`$3`

`typescript import { z, // Request Schemas ReadUrlOptionsSchema, GetMarkdownOptionsSchema, ExtractLinksOptionsSchema, GetLinksOptionsSchema, ListLogsOptionsSchema, GetOneLogOptionsSchema,

// Response Schemas ReadUrlResponseSchema, GetMarkdownResponseSchema, ExtractLinksResponseSchema, GetLinksResponseSchema, ListLogsResponseSchema,

// Metadata & Metrics MetadataSchema, MetricsOptionsSchema, MetricsSchema,

// Links LinksTreeSchema,

// services CacheOptionsSchema } from 'deepcrawl/schemas';`

> Importing z from deepcrawl/zod/v4 (or from deepcrawl/schemas) reuses the > SDK's Zod runtime so schema composition works even if your app already has its > own Zod installation.

`$3`

`typescript import { z } from 'deepcrawl/zod/v4'; import { ReadUrlOptionsSchema } from 'deepcrawl/schemas';

const CustomSchema = ReadUrlOptionsSchema.extend({ customFlag: z.boolean().default(false), });`

Use this helper when you want to compose Zod schemas with the SDK’s public schemas or utils to avoid instance mismatch issues in projects that install multiple copies of Zod.

`$3`

`typescript import { // Zod schema helper OptionalBoolWithDefault,

// Pagination normalization normalizeListLogsPagination } from 'deepcrawl/types/utils';

// Example: Create optional boolean schema with default const schema = OptionalBoolWithDefault(true);

// Example: Normalize pagination input const normalized = normalizeListLogsPagination({ limit: 150, offset: -5 }); // Returns: { limit: 100, offset: 0 } (clamped to valid ranges)`

`📖 API Methods`

`$3`

Extract clean content and metadata from any URL.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.readUrl('https://example.com', { metadata: true, markdown: true, cleanedHtml: true, metricsOptions: { enabled: true } });

console.log(result.markdown); console.log(result.metadata?.title); console.log(result.metrics?.readableDuration);`

`$3`

Simplified method to get just markdown content.

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.getMarkdown('https://example.com', { metricsOptions: { enable: true } });

console.log(result.markdown);`

`$3`

Extract all links from a page with powerful filtering options.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ExtractLinksOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.extractLinks('https://example.com', { includeInternal: true, includeExternal: false, includeEmails: false, includePhoneNumbers: false, includeSocialMedia: false, metricsOptions: { enable: true } });

console.log(result.tree.internal); console.log(result.tree.socialMedia);`

`$3`

Retrieve activity logs with paginated results and filtering.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ListLogsOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.listLogs({ limit: 50, offset: 0, path: 'read-getMarkdown', success: true, startDate: '2025-01-01T00:00:00Z', endDate: '2025-12-31T23:59:59Z', orderBy: 'requestTimestamp', orderDir: 'desc' });

console.log(result.logs); console.log(result.meta.hasMore);`

`$3`

Get a single activity log entry by ID.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { GetOneLogOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const log = await deepcrawl.getOneLog({ id: 'request-id-123' });

console.log(log.path); console.log(log.response);`

`🌟 Real-World Usage Examples`

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

async function monitorProduct(productUrl: string) { try { const result = await deepcrawl.readUrl(productUrl, { metadata: true, cleanedHtml: true });

return { title: result.metadata?.title, lastChecked: new Date().toISOString() }; } catch (error) { if (error.isRateLimit?.()) { console.log(Rate limited. Retry after ${error.retryAfter}s); await new Promise(r => setTimeout(r, error.retryAfter * 1000)); return monitorProduct(productUrl); } throw error; } }`

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlResponse } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

async function aggregateArticles(urls: string[]) { const results = await Promise.allSettled( urls.map(url => deepcrawl.readUrl(url, { metadata: true, markdown: true })) );

return results.map((result, index) => ({ url: urls[index], success: result.status === 'fulfilled', data: result.status === 'fulfilled' ? result.value : null, error: result.status === 'rejected' ? result.reason.message : null })); }`

`$3`

`typescript // app/actions/scrape.ts 'use server'

import { DeepcrawlApp } from 'deepcrawl'; import { headers } from 'next/headers'; import { revalidatePath } from 'next/cache';

export async function scrapeUrlAction(url: string) { const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY, headers: await headers(), });

try { const result = await deepcrawl.readUrl(url, { metadata: true, markdown: true, });

revalidatePath('/dashboard');

return { success: true, data: { title: result.metadata?.title, content: result.markdown, } }; } catch (error) { return { success: false, error: { message: error.message, retryable: error.isRateLimit?.() || error.isNetwork?.(), } }; } }`

`$3`

`typescript import { useState, useCallback } from 'react'; import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlResponse } from 'deepcrawl/types';

export function useScraping(apiKey: string) { const [data, setData] = useState(null); const [loading, setLoading] = useState(false); const [error, setError] = useState(null);

const deepcrawl = new DeepcrawlApp({ apiKey });

const scrape = useCallback(async (url: string) => { setLoading(true); setError(null);

try { const result = await deepcrawl.readUrl(url, { metadata: true }); setData(result); } catch (err) { setError(err.message); } finally { setLoading(false); } }, [deepcrawl]);

return { data, loading, error, scrape }; }`

`$3`

`typescript // app/actions/logs.ts 'use server';

import { DeepcrawlApp } from 'deepcrawl'; import type { ListLogsResponse } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

export async function getActivityLogs() { try { const logs = await deepcrawl.listLogs({ limit: 50, offset: 0 }); return { success: true, data: logs }; } catch (error) { return { success: false, error: error instanceof Error ? error.message : 'Failed to fetch logs' }; } }`

`🛡️ Error Handling`

`$3`

`typescript import type { DeepcrawlError, DeepcrawlReadError, DeepcrawlLinksError, DeepcrawlRateLimitError, DeepcrawlAuthError, DeepcrawlValidationError, DeepcrawlNotFoundError, DeepcrawlServerError, DeepcrawlNetworkError, } from 'deepcrawl/types';`

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

try { const result = await deepcrawl.readUrl(url); } catch (error) { if (error.isRateLimit?.()) { console.log(Retry after ${error.retryAfter}s); } else if (error.isRead?.()) { console.log(Failed to read: ${error.message}); } }`

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

try { const result = await deepcrawl.readUrl(url); } catch (error) { // Check error type using instance methods if (error.isAuth?.()) { console.log('Authentication failed'); } else if (error.isValidation?.()) { console.log('Invalid request parameters'); } }`

`$3`

All errors include:

- code: string- oRPC error code -status: number- HTTP status -message: string- User-friendly error message -data: any - Raw error data from API

Rate limit errors include:

- retryAfter: number- Seconds to wait -operation: string - What operation was rate limited

Read/Links errors include:

- targetUrl: string- URL that failed -success: false - Always false for errors

`🔧 Configuration`

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { DeepcrawlConfig } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY, baseUrl: "https://api.deepcrawl.dev", headers: { 'User-Agent': 'MyApp/1.0' }, fetch: customFetch, fetchOptions: { timeout: 30000 } });`

`$3`

Automatic HTTP connection pooling in Node.js:

`typescript // Automatic configuration { keepAlive: true, maxSockets: 10, maxFreeSockets: 5, timeout: 60000, keepAliveMsecs: 30000 }`

Benefits:

- ⚡ Faster for concurrent requests - 🔄 Connection reuse reduces handshake overhead - 🎯 Auto-cleanup of idle connections

`🔒 Security Best Practices`

`$3`

`typescript // ✅ SECURE: lib/deepcrawl.ts 'use server';

import { DeepcrawlApp } from 'deepcrawl';

export const deepcrawlClient = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });`

`typescript // ✅ SECURE: app/actions/scrape.ts 'use server';

import { deepcrawlClient } from '@/lib/deepcrawl';

export async function scrapeAction(url: string) { return deepcrawlClient.readUrl(url); }`

`typescript // ✅ SECURE: Client component 'use client';

import { scrapeAction } from '@/app/actions/scrape';

export function ScrapeButton() { const handleClick = async () => { const result = await scrapeAction('https://example.com'); console.log(result); };

return ; }`

`$3`

`typescript // ❌ INSECURE: Direct SDK usage in client components 'use client';

import { DeepcrawlApp } from 'deepcrawl';

export function BadComponent() { const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY // ❌ Exposes API key! }); }``

🌍 Environment Support

⚠️ Server-Side Only: The Deepcrawl SDK is designed for server-side use:

- ✅ Node.js (18+) with connection pooling
- ✅ Cloudflare Workers
- ✅ Vercel Edge Runtime
- ✅ Next.js Server Actions (recommended)
- ✅ Deno, Bun, and other modern runtimes
- ❌ Browser environments (use Server Actions instead)

📄 License

MIT - see LICENSE for details.

🤝 Support

- 📖 Documentation
- 🐛 Issues
- 💬 Community Discord

---

Built with ❤️ by the @felixLu

Deepcrawl SDK

TypeScript SDK for the Deepcrawl API - Web scraping and crawling with comprehensive error handling.

![npm version](https://www.npmjs.com/package/deepcrawl)
![TypeScript](https://www.typescriptlang.org)
![MIT License](https://opensource.org/licenses/MIT)

⚡ Why Deepcrawl SDK?

📦 Installation

``bash npm install deepcrawl

`or`


yarn add deepcrawl
or

pnpm add deepcrawl


> Zod v4 ships with the SDK as a runtime dependency and is mirrored as a peer dependency.
> If your app already provides Zod ≥4.1, your package manager will dedupe it; otherwise,
> the bundled copy means no extra install step.
🚀 Quick Start

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.readUrl('https://example.com'); console.log(result.markdown);`

`📦 Package Exports`

The SDK uses dedicated export paths for better tree-shaking and organization:

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl';`

`$3`

`typescript import type { // Configuration DeepcrawlConfig,

// API Types ReadUrlOptions, ReadUrlResponse, GetMarkdownOptions, GetMarkdownResponse, ExtractLinksOptions, ExtractLinksResponse, GetLinksOptions, GetLinksResponse,

// Activity Logs ActivityLogEntry, ListLogsOptions, ListLogsResponse, GetOneLogOptions,

// Metadata & Metrics Metadata, MetricsOptions, Metrics,

// Links LinksTree, LinkItem, SocialMediaLink,

`$3`

`typescript import { z, // Request Schemas ReadUrlOptionsSchema, GetMarkdownOptionsSchema, ExtractLinksOptionsSchema, GetLinksOptionsSchema, ListLogsOptionsSchema, GetOneLogOptionsSchema,

// Response Schemas ReadUrlResponseSchema, GetMarkdownResponseSchema, ExtractLinksResponseSchema, GetLinksResponseSchema, ListLogsResponseSchema,

// Metadata & Metrics MetadataSchema, MetricsOptionsSchema, MetricsSchema,

// Links LinksTreeSchema,

// services CacheOptionsSchema } from 'deepcrawl/schemas';`

> Importing z from deepcrawl/zod/v4 (or from deepcrawl/schemas) reuses the > SDK's Zod runtime so schema composition works even if your app already has its > own Zod installation.

`$3`

`typescript import { z } from 'deepcrawl/zod/v4'; import { ReadUrlOptionsSchema } from 'deepcrawl/schemas';

const CustomSchema = ReadUrlOptionsSchema.extend({ customFlag: z.boolean().default(false), });`

Use this helper when you want to compose Zod schemas with the SDK’s public schemas or utils to avoid instance mismatch issues in projects that install multiple copies of Zod.

`$3`

`typescript import { // Zod schema helper OptionalBoolWithDefault,

// Pagination normalization normalizeListLogsPagination } from 'deepcrawl/types/utils';

// Example: Create optional boolean schema with default const schema = OptionalBoolWithDefault(true);

// Example: Normalize pagination input const normalized = normalizeListLogsPagination({ limit: 150, offset: -5 }); // Returns: { limit: 100, offset: 0 } (clamped to valid ranges)`

`📖 API Methods`

`$3`

Extract clean content and metadata from any URL.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.readUrl('https://example.com', { metadata: true, markdown: true, cleanedHtml: true, metricsOptions: { enabled: true } });

console.log(result.markdown); console.log(result.metadata?.title); console.log(result.metrics?.readableDuration);`

`$3`

Simplified method to get just markdown content.

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const result = await deepcrawl.getMarkdown('https://example.com', { metricsOptions: { enable: true } });

console.log(result.markdown);`

`$3`

Extract all links from a page with powerful filtering options.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ExtractLinksOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

console.log(result.tree.internal); console.log(result.tree.socialMedia);`

`$3`

Retrieve activity logs with paginated results and filtering.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ListLogsOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

console.log(result.logs); console.log(result.meta.hasMore);`

`$3`

Get a single activity log entry by ID.

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { GetOneLogOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

const log = await deepcrawl.getOneLog({ id: 'request-id-123' });

console.log(log.path); console.log(log.response);`

`🌟 Real-World Usage Examples`

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlOptions } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

async function monitorProduct(productUrl: string) { try { const result = await deepcrawl.readUrl(productUrl, { metadata: true, cleanedHtml: true });

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlResponse } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

async function aggregateArticles(urls: string[]) { const results = await Promise.allSettled( urls.map(url => deepcrawl.readUrl(url, { metadata: true, markdown: true })) );

`$3`

`typescript // app/actions/scrape.ts 'use server'

import { DeepcrawlApp } from 'deepcrawl'; import { headers } from 'next/headers'; import { revalidatePath } from 'next/cache';

export async function scrapeUrlAction(url: string) { const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY, headers: await headers(), });

try { const result = await deepcrawl.readUrl(url, { metadata: true, markdown: true, });

revalidatePath('/dashboard');

`$3`

`typescript import { useState, useCallback } from 'react'; import { DeepcrawlApp } from 'deepcrawl'; import type { ReadUrlResponse } from 'deepcrawl/types';

export function useScraping(apiKey: string) { const [data, setData] = useState(null); const [loading, setLoading] = useState(false); const [error, setError] = useState(null);

const deepcrawl = new DeepcrawlApp({ apiKey });

const scrape = useCallback(async (url: string) => { setLoading(true); setError(null);

try { const result = await deepcrawl.readUrl(url, { metadata: true }); setData(result); } catch (err) { setError(err.message); } finally { setLoading(false); } }, [deepcrawl]);

return { data, loading, error, scrape }; }`

`$3`

`typescript // app/actions/logs.ts 'use server';

import { DeepcrawlApp } from 'deepcrawl'; import type { ListLogsResponse } from 'deepcrawl/types';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

`🛡️ Error Handling`

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

`$3`

`typescript import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });

`$3`

All errors include:

- code: string- oRPC error code -status: number- HTTP status -message: string- User-friendly error message -data: any - Raw error data from API

Rate limit errors include:

- retryAfter: number- Seconds to wait -operation: string - What operation was rate limited

Read/Links errors include:

- targetUrl: string- URL that failed -success: false - Always false for errors

`🔧 Configuration`

`typescript import { DeepcrawlApp } from 'deepcrawl'; import type { DeepcrawlConfig } from 'deepcrawl/types';

`$3`

Automatic HTTP connection pooling in Node.js:

`typescript // Automatic configuration { keepAlive: true, maxSockets: 10, maxFreeSockets: 5, timeout: 60000, keepAliveMsecs: 30000 }`

Benefits:

- ⚡ Faster for concurrent requests - 🔄 Connection reuse reduces handshake overhead - 🎯 Auto-cleanup of idle connections

`🔒 Security Best Practices`

`$3`

`typescript // ✅ SECURE: lib/deepcrawl.ts 'use server';

import { DeepcrawlApp } from 'deepcrawl';

export const deepcrawlClient = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });`

`typescript // ✅ SECURE: app/actions/scrape.ts 'use server';

import { deepcrawlClient } from '@/lib/deepcrawl';

export async function scrapeAction(url: string) { return deepcrawlClient.readUrl(url); }`

`typescript // ✅ SECURE: Client component 'use client';

import { scrapeAction } from '@/app/actions/scrape';

export function ScrapeButton() { const handleClick = async () => { const result = await scrapeAction('https://example.com'); console.log(result); };

return ; }`

`$3`

`typescript // ❌ INSECURE: Direct SDK usage in client components 'use client';

import { DeepcrawlApp } from 'deepcrawl';

export function BadComponent() { const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY // ❌ Exposes API key! }); }``

🌍 Environment Support

⚠️ Server-Side Only: The Deepcrawl SDK is designed for server-side use:

📄 License

MIT - see LICENSE for details.

🤝 Support

- 📖 Documentation
- 🐛 Issues
- 💬 Community Discord

---

Built with ❤️ by the @felixLu