LLM based page parser (PDF to text)
npm install page-piranha

Use LLMs to convert PDFs to text, markdown, or JSON. By default, page piranha uses the Gemini 2.0 Flash model.
- Convert PDFs to plain text, markdown, or JSON
- Support for local files and remote URLs
- Pipe-friendly CLI interface
- Progress indicators and colorful output
- Configurable output directory
- Optional custom prompts for fine-tuned conversions
``bash`
npm install page-piranha
Page Piranha requires Google Cloud Platform credentials to use Vertex AI. Create a .env file with the following variables:
`bash`
GCP_PROJECT=your_gcp_project
GCP_LOCATION=your_gcp_location
GOOGLE_APPLICATION_CREDENTIALS=path_to_your_gcp_credentials_file
Basic usage:
`bash`
.bin/page-piranha -f input.pdf -m text -o output
Options:
- -f, --file - The PDF file to convert (required)-m, --mode
- - Conversion mode: text, markdown, or json (default: text)-o, --outDir
- - Output directory (default: out)-t, --tee
- - Output to both file and stdout-v, --verbose
- - Enable verbose logging-p, --prompt
- - Additional hints for conversion
Examples:
`bash`
.bin/page-piranha -f document.pdf -m text
`bash`
.bin/page-piranha -f document.pdf -m markdown -o converted
`bash`
.bin/page-piranha -f assets/demo.pdf -m json -p "Make sure to use camel case. This is an invoice. Feel free to nest fields" -t | jq
Page Piranha can be used programmatically in your TypeScript/JavaScript projects:
`typescript
import { PagePiranha } from 'page-piranha';
import { JorEl } from 'jorel';
// Initialize
const jorEl = new JorEl({ vertexAi: true });
const piranha = new PagePiranha(jorEl);
// Convert to text
const text = await piranha.toText('document.pdf');
// Convert to markdown with additional prompt
const markdown = await piranha.toMarkdown('document.pdf', 'Focus on headers and lists');
// Convert to JSON
const json = await piranha.toJson('document.pdf');
`
#### Constructor
- constructor(jorEl: JorEl, options?: PagePiranhaOptions)
#### Methods
- toText(fileOrFiles: string | Buffer, additionalPrompt?: string): PromisetoMarkdown(fileOrFiles: string | Buffer, additionalPrompt?: string): Promise
- toJson(fileOrFiles: string | Buffer, additionalPrompt?: string): Promise
-
MIT