A simple library for scraping web data and extracting structured information using LLMs.
npm install @sebastianappelberg/data-scraperA simple library for scraping web data and extracting structured information using LLMs.
The Scraper class allows you to navigate web pages and extract HTML content.
``typescript
import { Scraper, convertToMarkdown } from '@sebastianappelberg/data-scraper';
async function runScrape() {
const scraper = new Scraper({ headless: true, debug: false });
try {
const htmlContent = await scraper.scrape({
url: 'https://example.com',
getContent: async (page: Page) => {
// You must provide a getContent function.
return await page.content();
},
});
console.log(htmlContent);
const markdownContent = convertToMarkdown(htmlContent);
console.log(markdownContent);
} finally {
await scraper.close();
}
}
runScrape();
`
The extractData function uses AI to parse and extract structured data from text.
`typescript
import { extractData } from '@sebastianappelberg/data-scraper';
async function runExtraction() {
const text = Product: Laptop, Price: $1200, Brand: ExampleTech;
const jsonData = await extractData(text, {
dataStructure: 'JSON',
prompt: 'Extract product name, price, and brand.',
});
console.log(jsonData);
}
runExtraction();
`
```
npm install data-scraper
This project is licensed under the MIT License. See the LICENSE file for details.