pdf-parse

Pure javascript cross-platform module to extract texts from PDFs.

![version](https://www.npmjs.org/package/pdf-parse)
![downloads](https://www.npmjs.org/package/pdf-parse)
![node](https://nodejs.org/)
![status](https://gitlab.com/autokent/pdf-parse/pipelines)

Similar Packages

* pdf2json buggy, no support anymore, memory leak, throws non-catchable fatal errors
* j-pdfjson fork of pdf2json
* pdf-parser buggy, no tests
* pdfreader using pdf2json
* pdf-extract not cross-platform using xpdf

Installation

npm install pdf-parse

Basic Usage - Local Files

``js const fs = require('fs'); const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer).then(function(data) {

// number of pages console.log(data.numpages); // number of rendered pages console.log(data.numrender); // PDF info console.log(data.info); // PDF metadata console.log(data.metadata); // PDF.js version // check https://mozilla.github.io/pdf.js/getting_started/ console.log(data.version); // PDF text console.log(data.text); });`

`Basic Usage - HTTP`


You can use crawler-request which uses the

pdf-parse


Exception Handling

`js const fs = require('fs'); const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer).then(function(data) { // use data }) .catch(function(error){ // handle exceptions })`

`Extend`


* v1.0.9 and above break pagerender callback changelog
* If you need another format like json, you can change page render behaviour with a callback
* Check out https://mozilla.github.io/pdf.js/

`js // default render callback function render_page(pageData) { //check documents https://mozilla.github.io/pdf.js/ let render_options = { //replaces all occurrences of whitespace with standard spaces (0x20). The default value isfalse. normalizeWhitespace: false, //do not attempt to combine same line TextItem's. The default value isfalse. disableCombineTextItems: false }

return pageData.getTextContent(render_options) .then(function(textContent) { let lastY, text = ''; for (let item of textContent.items) { if (lastY == item.transform[5] || !lastY){ text += item.str; } else{ text += '\n' + item.str; } lastY = item.transform[5]; } return text; }); }

let options = { pagerender: render_page }

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer,options).then(function(data) { //use new format });`

`Options`

`js const DEFAULT_OPTIONS = { // internal page parser callback // you can set this option, if you need another format except raw text pagerender: render_page, // max page number to parse max: 0, //check https://mozilla.github.io/pdf.js/getting_started/ version: 'v1.10.100' }`

`$3`


If you need another format except raw text.  
$3

Max number of page to parse. If the value is less than or equal to 0, parser renders all pages.  
$3

check pdf.js

* 'default'*'v1.9.426'*'v1.10.100'*'v1.10.88'*'v2.0.550'

>default version is v1.10.100 >mozilla.github.io/pdf.js

`Test`

mocha or npm test`
* Check test folder and quickstart.js for extra usages.

Support

I use this package actively myself, so it has my top priority. You can chat on WhatsApp about any infos, ideas and suggestions.

![WhatsApp](https://api.whatsapp.com/send?phone=905063042480&text=Hi%2C%0ALet%27s%20talk%20about%20pdf-parse)

$3

If you find a bug or a mistake, you can help by submitting an issue to GitLab Repository

$3

GitLab calls it merge request instead of pull request.

* A Guide for First-Timers
* How to create a merge request
* Check Contributing Guide

License

MIT licensed and all it's dependencies are MIT or BSD licensed.

pdf-parse

Pure javascript cross-platform module to extract texts from PDFs.

![version](https://www.npmjs.org/package/pdf-parse)
![downloads](https://www.npmjs.org/package/pdf-parse)
![node](https://nodejs.org/)
![status](https://gitlab.com/autokent/pdf-parse/pipelines)

Similar Packages

Installation

npm install pdf-parse

Basic Usage - Local Files

``js const fs = require('fs'); const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer).then(function(data) {

`Basic Usage - HTTP`


You can use crawler-request which uses the

pdf-parse


Exception Handling

`js const fs = require('fs'); const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer).then(function(data) { // use data }) .catch(function(error){ // handle exceptions })`

`Extend`


* v1.0.9 and above break pagerender callback changelog
* If you need another format like json, you can change page render behaviour with a callback
* Check out https://mozilla.github.io/pdf.js/

let options = { pagerender: render_page }

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer,options).then(function(data) { //use new format });`

`Options`

`$3`


If you need another format except raw text.  
$3

Max number of page to parse. If the value is less than or equal to 0, parser renders all pages.  
$3

check pdf.js

* 'default'*'v1.9.426'*'v1.10.100'*'v1.10.88'*'v2.0.550'

>default version is v1.10.100 >mozilla.github.io/pdf.js

`Test`

mocha or npm test`
* Check test folder and quickstart.js for extra usages.

Support

I use this package actively myself, so it has my top priority. You can chat on WhatsApp about any infos, ideas and suggestions.

![WhatsApp](https://api.whatsapp.com/send?phone=905063042480&text=Hi%2C%0ALet%27s%20talk%20about%20pdf-parse)

$3

If you find a bug or a mistake, you can help by submitting an issue to GitLab Repository

$3

GitLab calls it merge request instead of pull request.

* A Guide for First-Timers
* How to create a merge request
* Check Contributing Guide

License

MIT licensed and all it's dependencies are MIT or BSD licensed.

pdf-parse-deno

pdf-parse

Similar Packages

Installation

Basic Usage - Local Files

`Basic Usage - HTTP`

Exception Handling

`Extend`

`Options`

`$3`

$3

$3

`Test`

Support

$3

$3

License

pdf-parse-deno

pdf-parse

Similar Packages

Installation

Basic Usage - Local Files

`Basic Usage - HTTP`

Exception Handling

`Extend`

`Options`

`$3`

$3

$3

`Test`

Support

$3

$3

License