Another simple Node.js wrapper for the popular `pdftotext` library.
npm install @rocknerve/pdftotextAnother simple Node.js wrapper for the popular pdftotext library.
This one supports _parse-until-time-limit_ and _parse-until-maximum-text-size_.
It also automatically installs pdftotext if it runs as the root user on a Debian/Ubuntu/Mint system, which is pretty nice.
No intermediate files are used.
``sh`
npm i @rocknerve/pdftotext
`js
const ConvertPDFToText = require("@rocknerve/pdftotext");
your_pdf_data_buffer = await readFile("example.pdf");
your_pdf_data_buffer = await (await fetch("https://example.com/example.pdf")).arrayBuffer();
your_plain_text_string = await ConvertPDFToText({
input: { body: your_pdf_data_buffer },
timelimit_ms: 10_000, // optional; limit processing to 10 seconds
sizelimit_bytes: 65535, // optional; limit text output to 64KB
logger: (line) => console.log(--- PDF parsing status: ${line}), // optional; you can also pass false to avoid default logging to stdout``
});
- Allow streaming IO
- Allow PDF URLs to be passed and fetched automatically
- Make nice for Deno 2
- Support DOCX or other types too