PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

![npm version](https://badge.fury.io/js/pdftotext-mcp)
![License: MIT](https://opensource.org/licenses/MIT)

🚀 Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

- ✅ Actually works - Clean JSON-RPC communication without stdout pollution
- ✅ Reliable - Built on mature pdftotext from poppler-utils (used by millions)
- ✅ Lightweight - Minimal dependencies, maximum compatibility
- ✅ Production tested - Successfully tested with Claude Desktop and other MCP clients
- ✅ Feature complete - Page-specific extraction, layout preservation, encoding options
- ✅ Error handling - Comprehensive validation and helpful error messages

📋 Features

- 📄 Extract text from entire PDF documents or specific pages
- 🎨 Preserve original layout formatting (optional)
- 🔤 Multiple text encoding support (UTF-8, Latin1, ASCII)
- 📊 Comprehensive metadata in responses (word count, file info, etc.)
- 🛡️ File validation and security checks
- ⚡ Fast processing with configurable timeouts
- 🔍 Detailed error reporting with troubleshooting hints

🔧 Prerequisites

You must have pdftotext installed on your system:

$3

bash
sudo apt update
sudo apt install poppler-utils

$3

bash
brew install poppler

$3

bash
Using Chocolatey

choco install poppler
Using Scoop

scoop install poppler

$3

bash
pdftotext -v


📦 Installation
$3

bash
npm install -g pdftotext-mcp

$3

bash
npx pdftotext-mcp

$3

bash
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm start


⚙️ Configuration
Add to your MCP client configuration:
$3

Add to

claude_desktop_config.json:

`json { "mcpServers": { "pdftotext": { "command": "pdftotext-mcp" } } }`

Or with npx:`json { "mcpServers": { "pdftotext": { "command": "npx", "args": ["pdftotext-mcp"] } } }`

`$3`


The server works with any MCP-compatible client. Use

pdftotext-mcp

 as the command.
🎯 Usage

The server provides a single, powerful tool: read_pdf_text

`$3`

#### Extract entire document`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf" } }`

#### Extract specific page`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "page": 2 } }`

#### Preserve layout formatting`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "layout": true } }`

#### Custom encoding`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "encoding": "Latin1" } }`

`$3`

#### Success Response`json { "success": true, "file": "document.pdf", "path": "/absolute/path/to/document.pdf", "extractedText": "Full text content...", "pageSpecific": "all", "layoutPreserved": false, "encoding": "UTF-8", "fileSize": 1048576, "lastModified": "2024-01-15T10:30:00.000Z", "extractedAt": "2024-01-15T10:35:00.000Z", "textLength": 5234, "wordCount": 892 }`

#### Error Response`json { "success": false, "error": "File not found: ./nonexistent.pdf", "errorType": "FILE_NOT_FOUND", "file": "./nonexistent.pdf", "timestamp": "2024-01-15T10:35:00.000Z" }`

`📚 API Reference`

`$3`

Extracts text content from PDF files using pdftotext.

#### Parameters

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| |path| string | ✅ | - | Path to PDF file (relative or absolute) | |page| number | ❌ | all pages | Specific page to extract (1-based) | |layout | boolean | ❌ | false| Preserve original text layout | |encoding | string | ❌ | "UTF-8" | Output text encoding |

#### Supported Encodings -UTF-8(default) -Latin1-ASCII

#### Error Types -FILE_NOT_FOUND- PDF file doesn't exist -PERMISSION_DENIED- Cannot read the file -INVALID_PDF- File is not a valid PDF -PDFTOTEXT_ERROR- pdftotext utility error -UNKNOWN_ERROR - Unexpected error

`🔧 Troubleshooting`

`$3`


Solution: Install poppler-utils (see Prerequisites)
$3

Solutions:
- Use absolute paths:

/home/user/document.pdf


- Check file exists:

ls -la /path/to/file.pdf


- Verify MCP server working directory
$3

Solutions:
- Check file permissions:

chmod 644 document.pdf


- Ensure directory is readable:

chmod 755 /path/to/directory/


$3

Solutions:
- Verify file is actually a PDF:

file document.pdf


- Check for file corruption
- Try with a different PDF file
$3

Solutions:
- Restart your MCP client completely
- Check configuration syntax in config file
- Verify

pdftotext-mcp

 is accessible in PATH
- Check MCP client logs for detailed errors
🧪 Testing

`bash

`Run tests`


npm test
Run tests with watch mode

npm run test:watch
Run linter

npm run lint


🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
$3

bash
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install

$3

bash
npm start


$3

This project uses ESLint. Run

npm run lint

 to check code style.
📄 License
MIT - see LICENSE file for details.
🙏 Acknowledgments

- Built for the Model Context Protocol ecosystem - Uses poppler-utilspdftotext` utility
- Inspired by the need for reliable PDF processing in MCP environments

---

Made for the MCP community

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

![npm version](https://badge.fury.io/js/pdftotext-mcp)
![License: MIT](https://opensource.org/licenses/MIT)

🚀 Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

📋 Features

🔧 Prerequisites

You must have pdftotext installed on your system:

$3

bash
sudo apt update
sudo apt install poppler-utils

$3

bash
brew install poppler

$3

bash
Using Chocolatey

choco install poppler
Using Scoop

scoop install poppler

$3

bash
pdftotext -v


📦 Installation
$3

bash
npm install -g pdftotext-mcp

$3

bash
npx pdftotext-mcp

$3

bash
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm start


⚙️ Configuration
Add to your MCP client configuration:
$3

Add to

claude_desktop_config.json:

`json { "mcpServers": { "pdftotext": { "command": "pdftotext-mcp" } } }`

Or with npx:`json { "mcpServers": { "pdftotext": { "command": "npx", "args": ["pdftotext-mcp"] } } }`

`$3`


The server works with any MCP-compatible client. Use

pdftotext-mcp

 as the command.
🎯 Usage

The server provides a single, powerful tool: read_pdf_text

`$3`

#### Extract entire document`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf" } }`

#### Extract specific page`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "page": 2 } }`

#### Preserve layout formatting`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "layout": true } }`

#### Custom encoding`javascript { "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "encoding": "Latin1" } }`

`$3`

#### Error Response`json { "success": false, "error": "File not found: ./nonexistent.pdf", "errorType": "FILE_NOT_FOUND", "file": "./nonexistent.pdf", "timestamp": "2024-01-15T10:35:00.000Z" }`

`📚 API Reference`

`$3`

Extracts text content from PDF files using pdftotext.

#### Parameters

#### Supported Encodings -UTF-8(default) -Latin1-ASCII

`🔧 Troubleshooting`

`$3`


Solution: Install poppler-utils (see Prerequisites)
$3

Solutions:
- Use absolute paths:

/home/user/document.pdf


- Check file exists:

ls -la /path/to/file.pdf


- Verify MCP server working directory
$3

Solutions:
- Check file permissions:

chmod 644 document.pdf


- Ensure directory is readable:

chmod 755 /path/to/directory/


$3

Solutions:
- Verify file is actually a PDF:

file document.pdf


- Check for file corruption
- Try with a different PDF file
$3

Solutions:
- Restart your MCP client completely
- Check configuration syntax in config file
- Verify

pdftotext-mcp

 is accessible in PATH
- Check MCP client logs for detailed errors
🧪 Testing

`bash

`Run tests`


npm test
Run tests with watch mode

npm run test:watch
Run linter

npm run lint


🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
$3

bash
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install

$3

bash
npm start


$3

This project uses ESLint. Run

npm run lint

 to check code style.
📄 License
MIT - see LICENSE file for details.
🙏 Acknowledgments

- Built for the Model Context Protocol ecosystem - Uses poppler-utilspdftotext` utility
- Inspired by the need for reliable PDF processing in MCP environments

---

Made for the MCP community

pdftotext-mcp

PDFtotext MCP Server

🚀 Why This Server?

📋 Features

🔧 Prerequisites

$3

$3

$3

Using Chocolatey

Using Scoop

$3

📦 Installation

$3

$3

$3

⚙️ Configuration

$3

$3

🎯 Usage

$3

$3

📚 API Reference

$3

🔧 Troubleshooting

$3

$3

$3

$3

$3

🧪 Testing

Run tests

Run tests with watch mode

Run linter

🤝 Contributing

$3

$3

$3

📄 License

🙏 Acknowledgments

🔗 Related

pdftotext-mcp

PDFtotext MCP Server

🚀 Why This Server?

📋 Features

🔧 Prerequisites

$3

$3

$3

Using Chocolatey

Using Scoop

$3

📦 Installation

$3

$3

$3

⚙️ Configuration

$3

$3

🎯 Usage

$3

$3

📚 API Reference

$3

🔧 Troubleshooting

$3

$3

$3

$3

$3

🧪 Testing

Run tests

Run tests with watch mode

Run linter

🤝 Contributing

$3

$3

$3

📄 License

🙏 Acknowledgments

🔗 Related

`$3`

`$3`

`$3`

`📚 API Reference`

`$3`

`🔧 Troubleshooting`

`$3`

`Run tests`

`$3`

`$3`

`$3`

`📚 API Reference`

`$3`

`🔧 Troubleshooting`

`$3`

`Run tests`