Universal content extraction library with tiered fetching strategies
npm install omnifetch-libJavaScript/TypeScript implementation of OmniFetch - a universal content extraction library.
- Universal Extraction: Fetches content from any URL, handling standard sites, SPAs, and paywalls.
- Tiered System:
1. Light Fetch: Fast, standard HTTP request.
2. Headless Browser: Handles dynamic JS-heavy sites (requires Netlify endpoint).
3. Search Fallback: Finds alternative sources for paywalled or blocked content.
- Smart Parsing: Converts HTML to clean Markdown or JSON.
``bash`
npm install omnifetch
`typescript
import { omniFetch } from 'omnifetch';
// Text extraction (Markdown)
const result = await omniFetch('https://example.com', { mode: 'TEXT' });
console.log(result.content);
// JSON extraction (Structured Data)
const jsonResult = await omniFetch('https://example.com', { mode: 'JSON' });
console.log(jsonResult.content.title);
`
The omniFetch function accepts a configuration object as the second argument:
`typescript`
interface OmniFetchConfig {
/* Output mode: 'JSON' for structured data, 'TEXT' for plain text (Markdown) /
mode: 'JSON' | 'TEXT';
/* Request timeout in milliseconds (default: 30000) /
timeout?: number;
/* Custom Netlify endpoint for headless browser fallback (Tier 2) /
netlifyEndpoint?: string;
/* Custom headers to include in requests /
headers?: Record
/* Whether to skip Tier 2 (headless) and go directly to search fallback /
skipHeadless?: boolean;
/* Whether to skip Tier 3 (search fallback) entirely /
skipSearch?: boolean;
/* Override title for search fallback (useful for opaque URLs like x.com/status/ID) /
forceTitle?: string;
}
#### Handling Blocked Domains (e.g., X/Twitter)
Some domains block direct scraping. OmniFetch automatically handles this by falling back to search (Tier 3). For opaque URLs, you can provide a forceTitle to improve search results.
`typescript`
const result = await omniFetch('https://x.com/someuser/status/12345', {
mode: 'TEXT',
forceTitle: 'Specific Tweet Content Title' // Helps find the content via search
});
#### Headless Browser Support
To enable Tier 2 (Headless Browser) for dynamic sites, you need to deploy the provided Netlify function and pass the endpoint.
`typescript`
const result = await omniFetch('https://dynamic-site.com', {
netlifyEndpoint: 'https://your-site.netlify.app/.netlify/functions/headless-fetch'
});
`bash`
npm install
npm run build
`bash``
npm run dev # Watch mode
See the main README.md for full documentation.