PDF-TO-TEXT

pdf-to-text is a tool to extract text from pdf. for the moment not support ocr scannig to extract text only works for searchable pdf files. This package doesn't have nodejs dependencies.

![Build Status](https://travis-ci.org/zetahernandez/pdf-to-text)

Installation

To install the module.
npm install pdf-to-text

You need install the next tools to use this module

- pdftotext
- pdftotext is used to extract text out of searchable pdf documents
- pdfinfo
- pdfinfo is used to obtain the info of pdf documents

$3

To begin on OSX, first make sure you have the homebrew package manager installed.

pdftotext is included as part on the xpdf utilities library. xpdf can be installed via homebrew
``bash brew install xpdf`

`$3`

pdftotext is included in the poppler-utils library. To installer poppler-utils execute`bash apt-get install poppler-utils`

`Usage`


=======
$3

Obtain info from pdf file`js var pdfUtil = require('pdf-to-text'); var pdf_path = "absolute_path/to/pdf_file.pdf";

pdfUtil.info(pdf_path, function(err, info) { if (err) throw(err); console.log(info); });`

It's retrieve an object with the data info from the pdf file

`json { "title": "some title", "subject": "TeX output 2003.10.17:1908", "author": "Fernando Hernandez", "creator": "creator name", "producer": "Acrobat Distiller 4.0 for Windows", "creationdate": 1066428670000, "moddate": 1066428687000, "tagged": "no", "form": "none", "pages": 8, "encrypted": "no", "page_size": "612 x 792 pts (letter)", "file_size": "28695 bytes", "optimized": "yes", "pdf_version": "1.2" }`

`$3`

You can extract text by a range of pages given an option object with from and to properties, or simply omit this option to extract all text from the pdf file

`js var pdfUtil = require('pdf-to-text'); var pdf_path = "absolute_path/to/pdf_file.pdf";

//option to extract text from page 0 to 10 var option = {from: 0, to: 10};

pdfUtil.pdfToText(upload.path, option, function(err, data) { if (err) throw(err); console.log(data); //print text });

//Omit option to extract all text from the pdf file pdfUtil.pdfToText(upload.path, function(err, data) { if (err) throw(err); console.log(data); //print all text });`

`Tests`

To test that your system satisfies the needed dependencies and that module is functioning correctly execute the command in the pdf-to-text module folder`cd /node_modules/pdf-to-text npm test``

PDF-TO-TEXT

pdf-to-text is a tool to extract text from pdf. for the moment not support ocr scannig to extract text only works for searchable pdf files. This package doesn't have nodejs dependencies.

![Build Status](https://travis-ci.org/zetahernandez/pdf-to-text)

Installation

To install the module.
npm install pdf-to-text

You need install the next tools to use this module

- pdftotext
- pdftotext is used to extract text out of searchable pdf documents
- pdfinfo
- pdfinfo is used to obtain the info of pdf documents

$3

To begin on OSX, first make sure you have the homebrew package manager installed.

pdftotext is included as part on the xpdf utilities library. xpdf can be installed via homebrew
``bash brew install xpdf`

`$3`

pdftotext is included in the poppler-utils library. To installer poppler-utils execute`bash apt-get install poppler-utils`

`Usage`


=======
$3

Obtain info from pdf file`js var pdfUtil = require('pdf-to-text'); var pdf_path = "absolute_path/to/pdf_file.pdf";

pdfUtil.info(pdf_path, function(err, info) { if (err) throw(err); console.log(info); });`

It's retrieve an object with the data info from the pdf file

`$3`

You can extract text by a range of pages given an option object with from and to properties, or simply omit this option to extract all text from the pdf file

`js var pdfUtil = require('pdf-to-text'); var pdf_path = "absolute_path/to/pdf_file.pdf";

//option to extract text from page 0 to 10 var option = {from: 0, to: 10};

pdfUtil.pdfToText(upload.path, option, function(err, data) { if (err) throw(err); console.log(data); //print text });

//Omit option to extract all text from the pdf file pdfUtil.pdfToText(upload.path, function(err, data) { if (err) throw(err); console.log(data); //print all text });`

`Tests`

To test that your system satisfies the needed dependencies and that module is functioning correctly execute the command in the pdf-to-text module folder`cd /node_modules/pdf-to-text npm test``