A reliable Model Context Protocol server for PDF text extraction using pdftotext from poppler-utils
npm install pdftotext-mcpA reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.


Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:
- โ
Actually works - Clean JSON-RPC communication without stdout pollution
- โ
Reliable - Built on mature pdftotext from poppler-utils (used by millions)
- โ
Lightweight - Minimal dependencies, maximum compatibility
- โ
Production tested - Successfully tested with Claude Desktop and other MCP clients
- โ
Feature complete - Page-specific extraction, layout preservation, encoding options
- โ
Error handling - Comprehensive validation and helpful error messages
- ๐ Extract text from entire PDF documents or specific pages
- ๐จ Preserve original layout formatting (optional)
- ๐ค Multiple text encoding support (UTF-8, Latin1, ASCII)
- ๐ Comprehensive metadata in responses (word count, file info, etc.)
- ๐ก๏ธ File validation and security checks
- โก Fast processing with configurable timeouts
- ๐ Detailed error reporting with troubleshooting hints
You must have pdftotext installed on your system:
bash
sudo apt update
sudo apt install poppler-utils
`$3
`bash
brew install poppler
`$3
`bash
Using Chocolatey
choco install popplerUsing Scoop
scoop install poppler
`$3
`bash
pdftotext -v
`๐ฆ Installation
$3
`bash
npm install -g pdftotext-mcp
`$3
`bash
npx pdftotext-mcp
`$3
`bash
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm start
`โ๏ธ Configuration
Add to your MCP client configuration:
$3
Add to claude_desktop_config.json:`json
{
"mcpServers": {
"pdftotext": {
"command": "pdftotext-mcp"
}
}
}
`Or with npx:
`json
{
"mcpServers": {
"pdftotext": {
"command": "npx",
"args": ["pdftotext-mcp"]
}
}
}
`$3
The server works with any MCP-compatible client. Use pdftotext-mcp as the command.๐ฏ Usage
The server provides a single, powerful tool:
read_pdf_text$3
#### Extract entire document
`javascript
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf"
}
}
`#### Extract specific page
`javascript
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf",
"page": 2
}
}
`#### Preserve layout formatting
`javascript
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf",
"layout": true
}
}
`#### Custom encoding
`javascript
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf",
"encoding": "Latin1"
}
}
`$3
#### Success Response
`json
{
"success": true,
"file": "document.pdf",
"path": "/absolute/path/to/document.pdf",
"extractedText": "Full text content...",
"pageSpecific": "all",
"layoutPreserved": false,
"encoding": "UTF-8",
"fileSize": 1048576,
"lastModified": "2024-01-15T10:30:00.000Z",
"extractedAt": "2024-01-15T10:35:00.000Z",
"textLength": 5234,
"wordCount": 892
}
`#### Error Response
`json
{
"success": false,
"error": "File not found: ./nonexistent.pdf",
"errorType": "FILE_NOT_FOUND",
"file": "./nonexistent.pdf",
"timestamp": "2024-01-15T10:35:00.000Z"
}
`๐ API Reference
$3
Extracts text content from PDF files using pdftotext.
#### Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
|
path | string | โ
| - | Path to PDF file (relative or absolute) |
| page | number | โ | all pages | Specific page to extract (1-based) |
| layout | boolean | โ | false | Preserve original text layout |
| encoding | string | โ | "UTF-8" | Output text encoding |#### Supported Encodings
-
UTF-8 (default)
- Latin1
- ASCII#### Error Types
-
FILE_NOT_FOUND - PDF file doesn't exist
- PERMISSION_DENIED - Cannot read the file
- INVALID_PDF - File is not a valid PDF
- PDFTOTEXT_ERROR - pdftotext utility error
- UNKNOWN_ERROR - Unexpected error๐ง Troubleshooting
$3
Solution: Install poppler-utils (see Prerequisites)$3
Solutions:
- Use absolute paths: /home/user/document.pdf
- Check file exists: ls -la /path/to/file.pdf
- Verify MCP server working directory$3
Solutions:
- Check file permissions: chmod 644 document.pdf
- Ensure directory is readable: chmod 755 /path/to/directory/$3
Solutions:
- Verify file is actually a PDF: file document.pdf
- Check for file corruption
- Try with a different PDF file$3
Solutions:
- Restart your MCP client completely
- Check configuration syntax in config file
- Verify pdftotext-mcp is accessible in PATH
- Check MCP client logs for detailed errors๐งช Testing
`bash
Run tests
npm testRun tests with watch mode
npm run test:watchRun linter
npm run lint
`๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
$3
`bash
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
`$3
`bash
npm start
`$3
This project uses ESLint. Run npm run lint to check code style.๐ License
MIT - see LICENSE file for details.
๐ Acknowledgments
- Built for the Model Context Protocol ecosystem
- Uses poppler-utils
pdftotext` utility- Model Context Protocol Documentation
- Claude Desktop MCP Configuration
- Poppler Utils Documentation
---
Made for the MCP community