@upcrawl/sdk

Official Node.js/Browser SDK for the Upcrawl API. Extract data from any website with a single API call.

Installation

``bash npm install @upcrawl/sdk`

Or with yarn:

`bash yarn add @upcrawl/sdk`

`Quick Start`

`typescript import Upcrawl from '@upcrawl/sdk';

// Set your API key (get one at https://upcrawl.dev) Upcrawl.setApiKey('uc-your-api-key');

// Scrape a webpage const result = await Upcrawl.scrape({ url: 'https://example.com', type: 'markdown' });

console.log(result.markdown);`

`Usage`

`$3`

The API key must be set before making any requests:

`typescript import Upcrawl from '@upcrawl/sdk';

Upcrawl.setApiKey('uc-your-api-key');`

Or using named imports:

`typescript import { setApiKey } from '@upcrawl/sdk';

setApiKey('uc-your-api-key');`

`$3`

`typescript import Upcrawl from '@upcrawl/sdk';

Upcrawl.setApiKey('uc-your-api-key');

const result = await Upcrawl.scrape({ url: 'https://example.com', type: 'markdown', // 'markdown' or 'html' onlyMainContent: true, // Remove nav, ads, footers extractMetadata: true // Get title, description, etc. });

console.log(result.markdown); console.log(result.metadata?.title);`

`$3`

Scrape multiple URLs in a single request:

`typescript const result = await Upcrawl.batchScrape({ urls: [ 'https://example.com/page1', 'https://example.com/page2', // You can also pass detailed options per URL: { url: 'https://example.com/page3', type: 'html' } ], type: 'markdown' });

console.log(Scraped ${result.successful} of ${result.total} pages);

result.results.forEach(page => { if (page.success) { console.log(${page.url}: ${page.markdown?.length} chars); } else { console.log(${page.url}: Failed - ${page.error}); } });`

`$3`

Search the web and get structured results:

`typescript const result = await Upcrawl.search({ queries: ['latest AI news 2025'], limit: 10, location: 'US' });

result.results.forEach(queryResult => { console.log(Query: ${queryResult.query}); queryResult.results.forEach(item => { console.log(- ${item.title}); console.log( ${item.url}); }); });`

`$3`

Generate a PDF from HTML content:

`typescript const result = await Upcrawl.generatePdf({ html: '

`Invoice`

Total: $500

',
  title: 'invoice-123',
  pageSize: 'A4',
  printBackground: true,
  margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' }
});

console.log(result.url); // Download URL for the PDF`

`$3`

Generate a PDF from any webpage:

`typescript const result = await Upcrawl.generatePdfFromUrl({ url: 'https://example.com/report', title: 'report', pageSize: 'Letter', landscape: true });

console.log(result.url); // Download URL for the PDF`

`$3`

Run code in an isolated sandbox environment:

`typescript const result = await Upcrawl.executeCode({ code: 'print("Hello, World!")', language: 'python' });

console.log(result.stdout); // "Hello, World!\n" console.log(result.exitCode); // 0 console.log(result.executionTimeMs); // 95.23 console.log(result.memoryUsageMb); // 8.45`

Each execution runs in its own isolated subprocess inside a Kata micro-VM with no network access. Code is cleaned up immediately after execution.

`typescript // Multi-line code with imports const result = await Upcrawl.executeCode({ code:
import json
data = {"name": "Upcrawl", "version": 1}
print(json.dumps(data, indent=2))
});

console.log(result.stdout); // { // "name": "Upcrawl", // "version": 1 // }`

`$3`

Filter search results by domain:

`typescript // Only include specific domains const result = await Upcrawl.search({ queries: ['machine learning tutorials'], includeDomains: ['medium.com', 'towardsdatascience.com'] });

// Or exclude domains const result2 = await Upcrawl.search({ queries: ['javascript frameworks'], excludeDomains: ['pinterest.com', 'quora.com'] });`

`$3`

Ask the API to summarize scraped content:

`typescript const result = await Upcrawl.scrape({ url: 'https://example.com/product', type: 'markdown', summary: { query: 'Extract the product name, price, and key features in JSON format' } });

console.log(result.content); // Summarized content`

`Configuration`

`$3`

For self-hosted instances or testing:

`typescript Upcrawl.setBaseUrl('https://your-instance.com/v1');`

`$3`

Set a custom timeout (in milliseconds):

`typescript Upcrawl.setTimeout(60000); // 60 seconds`

`$3`

`typescript Upcrawl.configure({ apiKey: 'uc-your-api-key', baseUrl: 'https://api.upcrawl.dev/v1', timeout: 120000 });`

`Error Handling`

The SDK throws UpcrawlError for API errors:

`typescript import Upcrawl, { UpcrawlError } from '@upcrawl/sdk';

try { const result = await Upcrawl.scrape({ url: 'https://example.com' }); } catch (error) { if (error instanceof UpcrawlError) { console.error(Error ${error.status}: ${error.message}); console.error(Code: ${error.code}); } }`

Common error codes:

| Status | Code | Description | |--------|------|-------------| | 401 |UNAUTHORIZED| Invalid or missing API key | | 403 |FORBIDDEN| Access forbidden | | 429 |RATE_LIMIT_EXCEEDED| Too many requests | | 500 |INTERNAL_ERROR | Server error |

`TypeScript Support`

The SDK includes full TypeScript definitions:

`typescript import Upcrawl, { ScrapeOptions, ScrapeResponse, SearchOptions, SearchResponse, BatchScrapeOptions, BatchScrapeResponse, GeneratePdfOptions, PdfResponse, ExecuteCodeOptions, ExecuteCodeResponse } from '@upcrawl/sdk';

const options: ScrapeOptions = { url: 'https://example.com', type: 'markdown' };

const result: ScrapeResponse = await Upcrawl.scrape(options);`

`API Reference`

`$3`

| Method | Description | |--------|-------------| |Upcrawl.setApiKey(key)| Set the API key globally | |Upcrawl.setBaseUrl(url)| Set custom base URL | |Upcrawl.setTimeout(ms)| Set request timeout | |Upcrawl.configure(config)| Configure multiple options | |Upcrawl.scrape(options)| Scrape a single URL | |Upcrawl.batchScrape(options)| Scrape multiple URLs | |Upcrawl.search(options)| Search the web | |Upcrawl.generatePdf(options)| Generate PDF from HTML | |Upcrawl.generatePdfFromUrl(options)| Generate PDF from a URL | |Upcrawl.executeCode(options) | Execute code in an isolated sandbox |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |apiKey| string | No | Your Upcrawl API key | |baseUrl| string | No | Custom API base URL | |timeout | number | No | Request timeout in milliseconds |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |query | string | Yes | Query/instruction for content summarization |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |url| string | Yes | URL to scrape (required) | |type | "html" \| "markdown"| No | Output format: html or markdown. Defaults to "html" | |onlyMainContent| boolean | No | Extract only main content (removes nav, ads, footers). Defaults to true | |extractMetadata| boolean | No | Whether to extract page metadata | |summary| object | No | Summary query for LLM summarization | |timeoutMs| number | No | Custom timeout in milliseconds (1000-120000) | |waitUntil | "load" \| "domcontentloaded" \| "networkidle" | No | Wait strategy for page load |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |title| string | No | | |description| string | No | | |canonicalUrl| string | No | | |finalUrl| string | No | | |contentType| string | No | | |contentLength | number | No | |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |url| string | Yes | Original URL that was scraped | |html| string \| null | No | Rendered HTML content (when type is html) | |markdown| string \| null | No | Content converted to Markdown (when type is markdown) | |statusCode| number \| null | Yes | HTTP status code | |success| boolean | Yes | Whether scraping was successful | |error| string | No | Error message if scraping failed | |timestamp| string | Yes | ISO timestamp when scraping completed | |loadTimeMs| number | Yes | Time taken to load and render the page in milliseconds | |metadata| object | No | Additional page metadata | |retryCount| number | Yes | Number of retry attempts made | |cost| number | No | Cost in USD for this scrape operation | |content | string \| null | No | Content after summarization (when summary query provided) |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |urls| string \| object[] | Yes | Array of URLs to scrape (strings or detailed request objects) | |type | "html" \| "markdown"| No | Output format: html or markdown | |onlyMainContent| boolean | No | Extract only main content (removes nav, ads, footers) | |summary| object | No | Summary query for LLM summarization | |batchTimeoutMs| number | No | Global timeout for entire batch operation in milliseconds (10000-600000) | |failFast | boolean | No | Whether to stop on first error |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |results| object[] | Yes | Array of scrape results | |total| number | Yes | Total number of URLs processed | |successful| number | Yes | Number of successful scrapes | |failed| number | Yes | Number of failed scrapes | |totalTimeMs| number | Yes | Total time taken for batch operation in milliseconds | |timestamp| string | Yes | Timestamp when batch operation completed | |cost | number | No | Total cost in USD for all scrape operations |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |queries| string[] | Yes | Array of search queries to execute (1-20) | |limit| number | No | Number of results per query (1-100). Defaults to 10 | |location| string | No | Location for search (e.g., "IN", "US") | |includeDomains| string[] | No | Domains to include (will add site: to query) | |excludeDomains | string[] | No | Domains to exclude (will add -site: to query) |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |url| string | Yes | URL of the search result | |title| string | Yes | Title of the search result | |description | string | Yes | Description/snippet of the search result |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |query| string | Yes | The search query | |success| boolean | Yes | Whether the search was successful | |results| object[] | Yes | Parsed search result links | |error| string | No | Error message if failed | |loadTimeMs| number | No | Time taken in milliseconds | |cost | number | No | Cost in USD for this query |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |results| object[] | Yes | Array of search results per query | |total| number | Yes | Total number of queries | |successful| number | Yes | Number of successful searches | |failed| number | Yes | Number of failed searches | |totalTimeMs| number | Yes | Total time in milliseconds | |timestamp| string | Yes | ISO timestamp | |cost | number | No | Total cost in USD |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |top| string | No | | |right| string | No | | |bottom| string | No | | |left | string | No | |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |html| string | Yes | Complete HTML content to convert to PDF (required) | |title| string | No | Title used for the exported filename | |pageSize | "A4" \| "Letter" \| "Legal"| No | Page size. Defaults to "A4" | |landscape| boolean | No | Landscape orientation. Defaults to false | |margin| object | No | Page margins (e.g., { top: "20mm", right: "20mm", bottom: "20mm", left: "20mm" }) | |printBackground| boolean | No | Print background graphics and colors. Defaults to true | |skipChartWait| boolean | No | Skip waiting for chart rendering signal. Defaults to false | |timeoutMs | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |url| string | Yes | URL to navigate to and convert to PDF (required) | |title| string | No | Title used for the exported filename | |pageSize | "A4" \| "Letter" \| "Legal"| No | Page size. Defaults to "A4" | |landscape| boolean | No | Landscape orientation. Defaults to false | |margin| object | No | Page margins | |printBackground| boolean | No | Print background graphics and colors. Defaults to true | |timeoutMs | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |success| boolean | Yes | Whether PDF generation succeeded | |url| string | No | Public URL of the generated PDF | |filename| string | No | Generated filename | |blobName| string | No | Blob storage path | |error| string | No | Error message on failure | |durationMs | number | Yes | Total time taken in milliseconds |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |code| string | Yes | Code to execute (required) | |language | "python" | No | Language runtime. Defaults to "python" |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |stdout| string | Yes | Standard output from the executed code | |stderr| string | Yes | Standard error from the executed code | |exitCode| number | Yes | Process exit code (0 = success, 124 = timeout) | |executionTimeMs| number | Yes | Execution time in milliseconds | |timedOut| boolean | Yes | Whether execution was killed due to timeout | |memoryUsageMb| number | No | Peak memory usage in megabytes | |error| string | No | Error message if execution infrastructure failed | |cost | number | No | Cost in USD for this execution |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |error| object | Yes | | |statusCode | number | No | |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |width| number | No | Browser viewport width (800-3840). Defaults to 1280 | |height| number | No | Browser viewport height (600-2160). Defaults to 720 | |headless | boolean | No | Run browser in headless mode. Defaults to true |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |sessionId| string | Yes | Unique session identifier | |wsEndpoint| string | Yes | WebSocket URL for connecting with Playwright/Puppeteer | |vncUrl| string \| null | Yes | VNC URL for viewing the browser (if available) | |affinityCookie| string | No | Affinity cookie for sticky session routing (format: SCRAPER_AFFINITY=xxx) - extracted from response headers | |createdAt| Date | Yes | Session creation timestamp | |width| number | Yes | Browser viewport width | |height` | number | Yes | Browser viewport height |

License

MIT

!Upcrawl

@upcrawl/sdk

Official Node.js/Browser SDK for the Upcrawl API. Extract data from any website with a single API call.

Installation

``bash npm install @upcrawl/sdk`

Or with yarn:

`bash yarn add @upcrawl/sdk`

`Quick Start`

`typescript import Upcrawl from '@upcrawl/sdk';

// Set your API key (get one at https://upcrawl.dev) Upcrawl.setApiKey('uc-your-api-key');

// Scrape a webpage const result = await Upcrawl.scrape({ url: 'https://example.com', type: 'markdown' });

console.log(result.markdown);`

`Usage`

`$3`

The API key must be set before making any requests:

`typescript import Upcrawl from '@upcrawl/sdk';

Upcrawl.setApiKey('uc-your-api-key');`

Or using named imports:

`typescript import { setApiKey } from '@upcrawl/sdk';

setApiKey('uc-your-api-key');`

`$3`

`typescript import Upcrawl from '@upcrawl/sdk';

Upcrawl.setApiKey('uc-your-api-key');

console.log(result.markdown); console.log(result.metadata?.title);`

`$3`

Scrape multiple URLs in a single request:

console.log(Scraped ${result.successful} of ${result.total} pages);

result.results.forEach(page => { if (page.success) { console.log(${page.url}: ${page.markdown?.length} chars); } else { console.log(${page.url}: Failed - ${page.error}); } });`

`$3`

Search the web and get structured results:

`typescript const result = await Upcrawl.search({ queries: ['latest AI news 2025'], limit: 10, location: 'US' });

result.results.forEach(queryResult => { console.log(Query: ${queryResult.query}); queryResult.results.forEach(item => { console.log(- ${item.title}); console.log( ${item.url}); }); });`

`$3`

Generate a PDF from HTML content:

`typescript const result = await Upcrawl.generatePdf({ html: '

`Invoice`

Total: $500

',
  title: 'invoice-123',
  pageSize: 'A4',
  printBackground: true,
  margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' }
});

console.log(result.url); // Download URL for the PDF`

`$3`

Generate a PDF from any webpage:

`typescript const result = await Upcrawl.generatePdfFromUrl({ url: 'https://example.com/report', title: 'report', pageSize: 'Letter', landscape: true });

console.log(result.url); // Download URL for the PDF`

`$3`

Run code in an isolated sandbox environment:

`typescript const result = await Upcrawl.executeCode({ code: 'print("Hello, World!")', language: 'python' });

console.log(result.stdout); // "Hello, World!\n" console.log(result.exitCode); // 0 console.log(result.executionTimeMs); // 95.23 console.log(result.memoryUsageMb); // 8.45`

Each execution runs in its own isolated subprocess inside a Kata micro-VM with no network access. Code is cleaned up immediately after execution.

`typescript // Multi-line code with imports const result = await Upcrawl.executeCode({ code:
import json
data = {"name": "Upcrawl", "version": 1}
print(json.dumps(data, indent=2))
});

console.log(result.stdout); // { // "name": "Upcrawl", // "version": 1 // }`

`$3`

Filter search results by domain:

`typescript // Only include specific domains const result = await Upcrawl.search({ queries: ['machine learning tutorials'], includeDomains: ['medium.com', 'towardsdatascience.com'] });

// Or exclude domains const result2 = await Upcrawl.search({ queries: ['javascript frameworks'], excludeDomains: ['pinterest.com', 'quora.com'] });`

`$3`

Ask the API to summarize scraped content:

`typescript const result = await Upcrawl.scrape({ url: 'https://example.com/product', type: 'markdown', summary: { query: 'Extract the product name, price, and key features in JSON format' } });

console.log(result.content); // Summarized content`

`Configuration`

`$3`

For self-hosted instances or testing:

`typescript Upcrawl.setBaseUrl('https://your-instance.com/v1');`

`$3`

Set a custom timeout (in milliseconds):

`typescript Upcrawl.setTimeout(60000); // 60 seconds`

`$3`

`typescript Upcrawl.configure({ apiKey: 'uc-your-api-key', baseUrl: 'https://api.upcrawl.dev/v1', timeout: 120000 });`

`Error Handling`

The SDK throws UpcrawlError for API errors:

`typescript import Upcrawl, { UpcrawlError } from '@upcrawl/sdk';

Common error codes:

`TypeScript Support`

The SDK includes full TypeScript definitions:

const options: ScrapeOptions = { url: 'https://example.com', type: 'markdown' };

const result: ScrapeResponse = await Upcrawl.scrape(options);`

`API Reference`

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |query | string | Yes | Query/instruction for content summarization |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |top| string | No | | |right| string | No | | |bottom| string | No | | |left | string | No | |

`$3`

| Field | Type | Required | Description | |-------|------|----------|-------------| |error| object | Yes | | |statusCode | number | No | |

`$3`

License

MIT