A pure JavaScript, cross-platform module designed for extracting text from PDF files.
npm install pdf-parse2

!license

!type
!size
A pure JavaScript, cross-platform module designed for extracting text from PDF files using pdf.js.
- Extract text from PDF files.
- Supports both browser and Node.js environments.
- Easy to use with promise-based API.
``bash`
npm install pdf-parse2
Or
`bash`
yarn add pdf-parse2
`javascript
const fs = require('fs');
const PDFParse = require('pdf-parse2');
(async () => {
const dataBuffer = fs.readFileSync('path/to/your/document.pdf');
const PDFParse = new PDFParse();
try {
const pdfData = await PDFParse.loadPDF(dataBuffer);
console.log('Text:', pdfData.text);
} catch (error) {
console.error(error);
}
})();
`
Ensure you include pdf.js library in your project. You can then use PDFParse similar to the Node.js example, but with fetching the PDF file using Fetch API or XMLHttpRequest.
- loadPDF(src, options): Loads a PDF file and extracts text. src can be a Buffer or ArrayBuffer. options is optional.
- renderPage(pageData, options): A helper function for rendering a single page. This function is used internally by loadPDF`.
Contributions are welcome! Please feel free to submit a Pull Request or open an issue for any bugs or feature requests.
This project is licensed under the MIT License - see the LICENSE file for details.