A lightweight, highly efficient, and customizable Node.js library for crawling websites and converting pages into compact, AI-optimized PDFs. Ideal for data archiving, offline analysis, and feeding content to AI tools. Delivers fast performance and allows
npm install e2pdf
     !npm bundle size 
A tiny, fast, and customizable Node.js library to crawl websites and save all pages as compact, AI-ready PDFs. Use it from the command line or as a module in your Node.js scripts. Perfect for data archiving, offline analysis, and feeding content to AI tools.
- Blazing Fast: Optimized for speed and performance.
- Lightweight: Minimal resource usage for crawling and PDF generation.
- Customizable: Full control over PDF formatting and crawling behavior.
- AI-Optimized PDFs: Compact and structured for AI consumption.
- Dual Usage: Use via CLI or integrate into Node.js scripts.
>
Star this repository and share it with your friends.
Install using pnpm, npm, or yarn
``bash`
pnpm add e2pdf
_or_
`bash`
npm install e2pdf
_or_
`bash`
yarn add e2pdf
To use e2pdf from the command line:
`bash`
e2pdf
For example:
`bash`
e2pdf https://example.com
This will crawl the website and save all pages as PDFs in the current directory.
Here’s an example of using e2pdf in a Node.js script:
`javascript
import e2pdf from "e2pdf";
(async () => {
await e2pdf("https://example.com", {
out: "./pdfs",
pdf: {
format: "A4",
printBackground: true,
margin: { top: "20px", bottom: "20px" },
},
crawlerOptions: { maxRequestsPerCrawl: 100 },
});
console.log("Crawling completed! PDFs saved to ./pdfs");
})();
`
The e2pdf function accepts two arguments:
1. startUrl (string): The URL to start crawling from.
2. options (E2PdfOptions): Configuration object for crawling and PDF generation.
#### out
- Type: stringprocess.cwd()
- Default:
- Directory to save the generated PDFs.
#### pdf
PDF generation options (compatible with Playwright’s PDF options):
- displayHeaderFooter: Display header and footer. Defaults to false.footerTemplate
- : HTML template for the footer.format
- : Paper format (e.g., A4, Letter). Defaults to Letter.headerTemplate
- : HTML template for the header.landscape
- : Paper orientation. Defaults to false.margin
- : Margins for the PDF (top, right, bottom, left).printBackground
- : Print background graphics. Defaults to false.
- ...and many more options for fine-tuning PDFs.
#### crawlerOptions
Options for the Crawlee PlaywrightCrawler.
#### crawlerConfig`
Configuration for Crawlee’s Configuration object.
We welcome contributions! Please fork the repository and submit a pull request.
This library is licensed under the MPL-2.0 open-source license.
If you encounter any issues or have suggestions, please open an issue or contact us. We’d love to hear from you!
>
Please enroll in our courses or sponsor our work.
with 💖 by Mayank Kumar Chaudhari