Official Upcrawl SDK - Extract data from any website with a single API call
npm install @upcrawl/sdkOfficial Node.js/Browser SDK for the Upcrawl API. Extract data from any website with a single API call.
``bash`
npm install @upcrawl/sdk
Or with yarn:
`bash`
yarn add @upcrawl/sdk
`typescript
import Upcrawl from '@upcrawl/sdk';
// Set your API key (get one at https://upcrawl.dev)
Upcrawl.setApiKey('uc-your-api-key');
// Scrape a webpage
const result = await Upcrawl.scrape({
url: 'https://example.com',
type: 'markdown'
});
console.log(result.markdown);
`
The API key must be set before making any requests:
`typescript
import Upcrawl from '@upcrawl/sdk';
Upcrawl.setApiKey('uc-your-api-key');
`
Or using named imports:
`typescript
import { setApiKey } from '@upcrawl/sdk';
setApiKey('uc-your-api-key');
`
`typescript
import Upcrawl from '@upcrawl/sdk';
Upcrawl.setApiKey('uc-your-api-key');
const result = await Upcrawl.scrape({
url: 'https://example.com',
type: 'markdown', // 'markdown' or 'html'
onlyMainContent: true, // Remove nav, ads, footers
extractMetadata: true // Get title, description, etc.
});
console.log(result.markdown);
console.log(result.metadata?.title);
`
Scrape multiple URLs in a single request:
`typescript
const result = await Upcrawl.batchScrape({
urls: [
'https://example.com/page1',
'https://example.com/page2',
// You can also pass detailed options per URL:
{ url: 'https://example.com/page3', type: 'html' }
],
type: 'markdown'
});
console.log(Scraped ${result.successful} of ${result.total} pages);
result.results.forEach(page => {
if (page.success) {
console.log(${page.url}: ${page.markdown?.length} chars);${page.url}: Failed - ${page.error}
} else {
console.log();`
}
});
Search the web and get structured results:
`typescript
const result = await Upcrawl.search({
queries: ['latest AI news 2025'],
limit: 10,
location: 'US'
});
result.results.forEach(queryResult => {
console.log(Query: ${queryResult.query});- ${item.title}
queryResult.results.forEach(item => {
console.log(); ${item.url}
console.log();`
});
});
Generate a PDF from HTML content:
` Total: $500typescript
const result = await Upcrawl.generatePdf({
html: 'Invoice
title: 'invoice-123',
pageSize: 'A4',
printBackground: true,
margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' }
});
console.log(result.url); // Download URL for the PDF
`
Generate a PDF from any webpage:
`typescript
const result = await Upcrawl.generatePdfFromUrl({
url: 'https://example.com/report',
title: 'report',
pageSize: 'Letter',
landscape: true
});
console.log(result.url); // Download URL for the PDF
`
Run code in an isolated sandbox environment:
`typescript
const result = await Upcrawl.executeCode({
code: 'print("Hello, World!")',
language: 'python'
});
console.log(result.stdout); // "Hello, World!\n"
console.log(result.exitCode); // 0
console.log(result.executionTimeMs); // 95.23
console.log(result.memoryUsageMb); // 8.45
`
Each execution runs in its own isolated subprocess inside a Kata micro-VM with no network access. Code is cleaned up immediately after execution.
`typescript
// Multi-line code with imports
const result = await Upcrawl.executeCode({
code:
import json
data = {"name": "Upcrawl", "version": 1}
print(json.dumps(data, indent=2))
});
console.log(result.stdout);
// {
// "name": "Upcrawl",
// "version": 1
// }
`
Filter search results by domain:
`typescript
// Only include specific domains
const result = await Upcrawl.search({
queries: ['machine learning tutorials'],
includeDomains: ['medium.com', 'towardsdatascience.com']
});
// Or exclude domains
const result2 = await Upcrawl.search({
queries: ['javascript frameworks'],
excludeDomains: ['pinterest.com', 'quora.com']
});
`
Ask the API to summarize scraped content:
`typescript
const result = await Upcrawl.scrape({
url: 'https://example.com/product',
type: 'markdown',
summary: {
query: 'Extract the product name, price, and key features in JSON format'
}
});
console.log(result.content); // Summarized content
`
For self-hosted instances or testing:
`typescript`
Upcrawl.setBaseUrl('https://your-instance.com/v1');
Set a custom timeout (in milliseconds):
`typescript`
Upcrawl.setTimeout(60000); // 60 seconds
`typescript`
Upcrawl.configure({
apiKey: 'uc-your-api-key',
baseUrl: 'https://api.upcrawl.dev/v1',
timeout: 120000
});
The SDK throws UpcrawlError for API errors:
`typescript
import Upcrawl, { UpcrawlError } from '@upcrawl/sdk';
try {
const result = await Upcrawl.scrape({ url: 'https://example.com' });
} catch (error) {
if (error instanceof UpcrawlError) {
console.error(Error ${error.status}: ${error.message});Code: ${error.code}
console.error();`
}
}
Common error codes:
| Status | Code | Description |
|--------|------|-------------|
| 401 | UNAUTHORIZED | Invalid or missing API key |FORBIDDEN
| 403 | | Access forbidden |RATE_LIMIT_EXCEEDED
| 429 | | Too many requests |INTERNAL_ERROR
| 500 | | Server error |
The SDK includes full TypeScript definitions:
`typescript
import Upcrawl, {
ScrapeOptions,
ScrapeResponse,
SearchOptions,
SearchResponse,
BatchScrapeOptions,
BatchScrapeResponse,
GeneratePdfOptions,
PdfResponse,
ExecuteCodeOptions,
ExecuteCodeResponse
} from '@upcrawl/sdk';
const options: ScrapeOptions = {
url: 'https://example.com',
type: 'markdown'
};
const result: ScrapeResponse = await Upcrawl.scrape(options);
`
| Method | Description |
|--------|-------------|
| Upcrawl.setApiKey(key) | Set the API key globally |Upcrawl.setBaseUrl(url)
| | Set custom base URL |Upcrawl.setTimeout(ms)
| | Set request timeout |Upcrawl.configure(config)
| | Configure multiple options |Upcrawl.scrape(options)
| | Scrape a single URL |Upcrawl.batchScrape(options)
| | Scrape multiple URLs |Upcrawl.search(options)
| | Search the web |Upcrawl.generatePdf(options)
| | Generate PDF from HTML |Upcrawl.generatePdfFromUrl(options)
| | Generate PDF from a URL |Upcrawl.executeCode(options)
| | Execute code in an isolated sandbox |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| apiKey | string | No | Your Upcrawl API key |baseUrl
| | string | No | Custom API base URL |timeout
| | number | No | Request timeout in milliseconds |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| query | string | Yes | Query/instruction for content summarization |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | URL to scrape (required) |type
| | "html" \| "markdown" | No | Output format: html or markdown. Defaults to "html" |onlyMainContent
| | boolean | No | Extract only main content (removes nav, ads, footers). Defaults to true |extractMetadata
| | boolean | No | Whether to extract page metadata |summary
| | object | No | Summary query for LLM summarization |timeoutMs
| | number | No | Custom timeout in milliseconds (1000-120000) |waitUntil
| | "load" \| "domcontentloaded" \| "networkidle" | No | Wait strategy for page load |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| title | string | No | |description
| | string | No | |canonicalUrl
| | string | No | |finalUrl
| | string | No | |contentType
| | string | No | |contentLength
| | number | No | |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | Original URL that was scraped |html
| | string \| null | No | Rendered HTML content (when type is html) |markdown
| | string \| null | No | Content converted to Markdown (when type is markdown) |statusCode
| | number \| null | Yes | HTTP status code |success
| | boolean | Yes | Whether scraping was successful |error
| | string | No | Error message if scraping failed |timestamp
| | string | Yes | ISO timestamp when scraping completed |loadTimeMs
| | number | Yes | Time taken to load and render the page in milliseconds |metadata
| | object | No | Additional page metadata |retryCount
| | number | Yes | Number of retry attempts made |cost
| | number | No | Cost in USD for this scrape operation |content
| | string \| null | No | Content after summarization (when summary query provided) |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| urls | string \| object[] | Yes | Array of URLs to scrape (strings or detailed request objects) |type
| | "html" \| "markdown" | No | Output format: html or markdown |onlyMainContent
| | boolean | No | Extract only main content (removes nav, ads, footers) |summary
| | object | No | Summary query for LLM summarization |batchTimeoutMs
| | number | No | Global timeout for entire batch operation in milliseconds (10000-600000) |failFast
| | boolean | No | Whether to stop on first error |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| results | object[] | Yes | Array of scrape results |total
| | number | Yes | Total number of URLs processed |successful
| | number | Yes | Number of successful scrapes |failed
| | number | Yes | Number of failed scrapes |totalTimeMs
| | number | Yes | Total time taken for batch operation in milliseconds |timestamp
| | string | Yes | Timestamp when batch operation completed |cost
| | number | No | Total cost in USD for all scrape operations |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| queries | string[] | Yes | Array of search queries to execute (1-20) |limit
| | number | No | Number of results per query (1-100). Defaults to 10 |location
| | string | No | Location for search (e.g., "IN", "US") |includeDomains
| | string[] | No | Domains to include (will add site: to query) |excludeDomains
| | string[] | No | Domains to exclude (will add -site: to query) |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | URL of the search result |title
| | string | Yes | Title of the search result |description
| | string | Yes | Description/snippet of the search result |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| query | string | Yes | The search query |success
| | boolean | Yes | Whether the search was successful |results
| | object[] | Yes | Parsed search result links |error
| | string | No | Error message if failed |loadTimeMs
| | number | No | Time taken in milliseconds |cost
| | number | No | Cost in USD for this query |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| results | object[] | Yes | Array of search results per query |total
| | number | Yes | Total number of queries |successful
| | number | Yes | Number of successful searches |failed
| | number | Yes | Number of failed searches |totalTimeMs
| | number | Yes | Total time in milliseconds |timestamp
| | string | Yes | ISO timestamp |cost
| | number | No | Total cost in USD |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| top | string | No | |right
| | string | No | |bottom
| | string | No | |left
| | string | No | |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| html | string | Yes | Complete HTML content to convert to PDF (required) |title
| | string | No | Title used for the exported filename |pageSize
| | "A4" \| "Letter" \| "Legal" | No | Page size. Defaults to "A4" |landscape
| | boolean | No | Landscape orientation. Defaults to false |margin
| | object | No | Page margins (e.g., { top: "20mm", right: "20mm", bottom: "20mm", left: "20mm" }) |printBackground
| | boolean | No | Print background graphics and colors. Defaults to true |skipChartWait
| | boolean | No | Skip waiting for chart rendering signal. Defaults to false |timeoutMs
| | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | URL to navigate to and convert to PDF (required) |title
| | string | No | Title used for the exported filename |pageSize
| | "A4" \| "Letter" \| "Legal" | No | Page size. Defaults to "A4" |landscape
| | boolean | No | Landscape orientation. Defaults to false |margin
| | object | No | Page margins |printBackground
| | boolean | No | Print background graphics and colors. Defaults to true |timeoutMs
| | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| success | boolean | Yes | Whether PDF generation succeeded |url
| | string | No | Public URL of the generated PDF |filename
| | string | No | Generated filename |blobName
| | string | No | Blob storage path |error
| | string | No | Error message on failure |durationMs
| | number | Yes | Total time taken in milliseconds |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| code | string | Yes | Code to execute (required) |language
| | "python" | No | Language runtime. Defaults to "python" |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| stdout | string | Yes | Standard output from the executed code |stderr
| | string | Yes | Standard error from the executed code |exitCode
| | number | Yes | Process exit code (0 = success, 124 = timeout) |executionTimeMs
| | number | Yes | Execution time in milliseconds |timedOut
| | boolean | Yes | Whether execution was killed due to timeout |memoryUsageMb
| | number | No | Peak memory usage in megabytes |error
| | string | No | Error message if execution infrastructure failed |cost
| | number | No | Cost in USD for this execution |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| error | object | Yes | |statusCode
| | number | No | |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| width | number | No | Browser viewport width (800-3840). Defaults to 1280 |height
| | number | No | Browser viewport height (600-2160). Defaults to 720 |headless
| | boolean | No | Run browser in headless mode. Defaults to true |
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| sessionId | string | Yes | Unique session identifier |wsEndpoint
| | string | Yes | WebSocket URL for connecting with Playwright/Puppeteer |vncUrl
| | string \| null | Yes | VNC URL for viewing the browser (if available) |affinityCookie
| | string | No | Affinity cookie for sticky session routing (format: SCRAPER_AFFINITY=xxx) - extracted from response headers |createdAt
| | Date | Yes | Session creation timestamp |width
| | number | Yes | Browser viewport width |height` | number | Yes | Browser viewport height |
|
MIT