A JavaScript frontend cross-browser compatible 'PDF parser w/ template engine' to convert pdf documents into organized data objects.
npm install pdf-template-parseA JavaScript frontend cross-browser compatible 'PDF parser w/ template engine' to convert pdf documents into organized data objects.
Live Demo: Click Here
Install with npm:
``bash`
npm install pdf-template-parse
Install with yarn:
`bash`
yarn add pdf-template-parse
This module exposes two functions:
1 - pdfParse (character & location extraction)
`js`
import { pdfParse } from 'pdf-template-parse';
pdfParse takes a pdf file and returns a promise. Promise resolves all the character data (character code, text, x, y, width) found in the provided document allowing the user to process the raw data themselves.
2 - pdfTemplateParse (character extraction & templating)
`js`
import pdfTemplateParse from 'pdf-template-parse';
pdfTemplateParse takes a pdf file and a template file and returns a promise. Promise resolves all the values / tables declared in the template file. (see example below for sample template file)
sample pdf download: helloWorldDemo.pdf
`js
import { pdfParse } from 'pdf-template-parse';
import pdf from './samplePdf/helloWorldDemo.pdf';
const characterData = pdfParse(pdf);
console.log({ characterData });
`
Output: (console screenshot)
!example one console screenshot
\\ Note: the promise will not resolve if the browser tab is not visible.
Template file: helloWorldDemo.json
`json`
{
"captureList": [
{
"name": "1",
"type": "value",
"rules": {
"all": {
"bounds": {
"top": 220,
"left": 70,
"bottom": 230,
"right": 140
}
}
}
},
{
"name": "2",
"type": "value",
"rules": {
"all": {
"bounds": {
"top": 220,
"left": 150,
"bottom": 230,
"right": 200
}
}
}
},
{
"name": "1+2",
"type": "value",
"rules": {
"all": {
"bounds": {
"top": 220,
"left": 70,
"bottom": 230,
"right": 200
}
}
}
}
]
}
Code:
`js
import pdfTemplateParse from 'pdf-template-parse';
import pdf from './samplePdf/helloWorldDemo.pdf';
import template from './sampleFile/helloWorldDemo.json';
const data = pdfTemplateParse(pdf, template);
console.log({ data });
``
Output: (console screenshot)
!example two console screenshot
\\ Note: the promise will not resolve if the browser tab is not visible.
- Add tests
- Replace char_offset option with character map detection
- Add value validation.
- Add template validation.
- Add node support (either remove canvas dependency or add node canvas package)
- Thomas J. Herzog - https://github.com/tomrule007
This project is licensed under the MIT License - see the LICENSE file for details