A robust Node.js library that extracts structured data from PDF invoices using OpenAI's GPT models.
It handles complex layouts, merged table columns, and varying invoice formats by converting the PDF to raw text and leveraging AI to parse it into a standardized JSON schema.
- 📄 PDF Parsing: Efficiently extracts raw text layers from PDF documents using pdf2json.
- 🤖 AI-Powered Extraction: Uses OpenAI to intelligently identify, categorize, and normalize fields.
- 📦 Standardized JSON: Outputs a consistent data structure (vendor, taxes, products, etc.) regardless of the invoice's visual layout.
- âš¡ Simple API: Exposes a single asynchronous function for seamless integration.
Installation
Install the package via npm:
``bash
npm install invoice-parser
`
Usage
`javascript
import { parseInvoice } from 'invoice-parser';