An MCP server providing tools to read PDF files.
npm install @sylphx/pdf-reader-mcp> Production-ready PDF processing server for AI agents







5-10x faster parallel processing β’ Y-coordinate content ordering β’ 94%+ test coverage β’ 103 tests passing
---
PDF Reader MCP is a production-ready Model Context Protocol server that empowers AI agents with enterprise-grade PDF processing capabilities. Extract text, images, and metadata with unmatched performance and reliability.
The Problem:
``typescript`
// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation
The Solution:
`typescript`
// PDF Reader MCP
- 5-10x faster parallel processing β‘
- Y-coordinate based ordering π
- Flexible path support (absolute/relative) π―
- Per-page error resilience π‘οΈ
- 94%+ test coverage β
Result: Production-ready PDF processing that scales.
---
- π 5-10x faster than sequential with automatic parallelization
- β‘ 12,933 ops/sec error handling, 5,575 ops/sec text extraction
- π¨ Process 50-page PDFs in seconds with multi-core utilization
- π¦ Lightweight with minimal dependencies
- π― Path Flexibility - Absolute & relative paths, Windows/Unix support (v1.3.0)
- πΌοΈ Smart Ordering - Y-coordinate based content preserves document layout
- π‘οΈ Type Safe - Full TypeScript with strict mode enabled
- π Battle-tested - 103 tests, 94%+ coverage, 98%+ function coverage
- π¨ Simple API - Single tool handles all operations elegantly
---
Real-world performance from production testing:
| Operation | Ops/sec | Performance | Use Case |
|-----------|---------|-------------|----------|
| Error handling | 12,933 | β‘β‘β‘β‘β‘ | Validation & safety |
| Extract full text | 5,575 | β‘β‘β‘β‘ | Document analysis |
| Extract page | 5,329 | β‘β‘β‘β‘ | Single page ops |
| Multiple pages | 5,242 | β‘β‘β‘β‘ | Batch processing |
| Metadata only | 4,912 | β‘β‘β‘ | Quick inspection |
| Document | Sequential | Parallel | Speedup |
|----------|-----------|----------|---------|
| 10-page PDF | ~2s | ~0.3s | 5-8x faster |
| 50-page PDF | ~10s | ~1s | 10x faster |
| 100+ pages | ~20s | ~2s | Linear scaling with CPU cores |
Benchmarks vary based on PDF complexity and system resources.
---
`bash`
claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp
Add to claude_desktop_config.json:
`json`
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
π Config file locations
- macOS: ~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json
- Windows: ~/.config/Claude/claude_desktop_config.json
- Linux:
`bash`
code --add-mcp '{"name":"pdf-reader","command":"npx","args":["@sylphx/pdf-reader-mcp"]}'
1. Open Settings β MCP β Add new MCP Server
2. Select Command type
3. Enter: npx @sylphx/pdf-reader-mcp
Add to your Windsurf MCP config:
`json`
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
Add to Cline's MCP settings:
`json`
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
1. Go to Settings β AI β Manage MCP Servers β Add
2. Command: npx, Args: @sylphx/pdf-reader-mcp
`bash`
npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
`bashQuick start - zero installation
npx @sylphx/pdf-reader-mcp
---
π― Quick Start
$3
`json
{
"sources": [{
"path": "documents/report.pdf"
}],
"include_full_text": true,
"include_metadata": true,
"include_page_count": true
}
`Result:
- β
Full text content extracted
- β
PDF metadata (author, title, dates)
- β
Total page count
- β
Structural sharing - unchanged parts preserved
$3
`json
{
"sources": [{
"path": "documents/manual.pdf",
"pages": "1-5,10,15-20"
}],
"include_full_text": true
}
`$3
`json
// Windows - Both formats work!
{
"sources": [{
"path": "C:\\Users\\John\\Documents\\report.pdf"
}],
"include_full_text": true
}// Unix/Mac
{
"sources": [{
"path": "/home/user/documents/contract.pdf"
}],
"include_full_text": true
}
`No more
"Absolute paths are not allowed" errors!$3
`json
{
"sources": [{
"path": "presentation.pdf",
"pages": [1, 2, 3]
}],
"include_images": true,
"include_full_text": true
}
`Response includes:
- Text and images in exact document order (Y-coordinate sorted)
- Base64-encoded images with metadata (width, height, format)
- Natural reading flow preserved for AI comprehension
$3
`json
{
"sources": [
{ "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
{ "url": "https://example.com/Q3.pdf" }
],
"include_full_text": true
}
`β‘ All PDFs processed in parallel automatically!
---
β¨ Features
$3
- β
Text Extraction - Full document or specific pages with intelligent parsing
- β
Image Extraction - Base64-encoded with complete metadata (width, height, format)
- β
Content Ordering - Y-coordinate based layout preservation for natural reading flow
- β
Metadata Extraction - Author, title, creation date, and custom properties
- β
Page Counting - Fast enumeration without loading full content
- β
Dual Sources - Local files (absolute or relative paths) and HTTP/HTTPS URLs
- β
Batch Processing - Multiple PDFs processed concurrently$3
- β‘ 5-10x Performance - Parallel page processing with Promise.all
- π― Smart Pagination - Extract ranges like "1-5,10-15,20"
- πΌοΈ Multi-Format Images - RGB, RGBA, Grayscale with automatic detection
- π‘οΈ Path Flexibility - Windows, Unix, and relative paths all supported (v1.3.0)
- π Error Resilience - Per-page error isolation with detailed messages
- π Large File Support - Efficient streaming and memory management
- π Type Safe - Full TypeScript with strict mode enabled---
π What's New in v1.3.0
$3
`json
// β
Windows
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }// β
Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }
// β
Relative (still works)
{ "path": "documents/report.pdf" }
`Other Improvements:
- π Fixed Zod validation error handling
- π¦ Updated all dependencies to latest versions
- β
103 tests passing, 94%+ coverage maintained
π View Full Changelog
v1.2.0 - Content Ordering
- Y-coordinate based text and image ordering
- Natural reading flow for AI models
- Intelligent line grouping
v1.1.0 - Image Extraction & Performance
- Base64-encoded image extraction
- 10x speedup with parallel processing
- Comprehensive test coverage (94%+)
---
π API Reference
$3
The single tool that handles all PDF operations.
#### Parameters
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
|
sources | Array | List of PDF sources to process | Required |
| include_full_text | boolean | Extract full text content | false |
| include_metadata | boolean | Extract PDF metadata | true |
| include_page_count | boolean | Include total page count | true |
| include_images | boolean | Extract embedded images | false |#### Source Object
`typescript
{
path?: string; // Local file path (absolute or relative)
url?: string; // HTTP/HTTPS URL to PDF
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
}
`#### Examples
Metadata only (fast):
`json
{
"sources": [{ "path": "large.pdf" }],
"include_metadata": true,
"include_page_count": true,
"include_full_text": false
}
`From URL:
`json
{
"sources": [{
"url": "https://arxiv.org/pdf/2301.00001.pdf"
}],
"include_full_text": true
}
`Page ranges:
`json
{
"sources": [{
"path": "manual.pdf",
"pages": "1-5,10-15,20" // Pages 1,2,3,4,5,10,11,12,13,14,15,20
}]
}
`---
π§ Advanced Usage
π Y-Coordinate Content Ordering
Content is returned in natural reading order based on Y-coordinates:
`
Document Layout:
βββββββββββββββββββββββ
β [Title] Y:100 β
β [Image] Y:150 β
β [Text] Y:400 β
β [Photo A] Y:500 β
β [Photo B] Y:550 β
βββββββββββββββββββββββResponse Order:
[
{ type: "text", text: "Title..." },
{ type: "image", data: "..." },
{ type: "text", text: "..." },
{ type: "image", data: "..." },
{ type: "image", data: "..." }
]
`Benefits:
- AI understands spatial relationships
- Natural document comprehension
- Perfect for vision-enabled models
- Automatic multi-line text grouping
πΌοΈ Image Extraction
Enable extraction:
`json
{
"sources": [{ "path": "manual.pdf" }],
"include_images": true
}
`Response format:
`json
{
"images": [{
"page": 1,
"index": 0,
"width": 1920,
"height": 1080,
"format": "rgb",
"data": "base64-encoded-png..."
}]
}
`Supported formats: RGB, RGBA, Grayscale
Auto-detected: JPEG, PNG, and other embedded formats
π Path Configuration
Absolute paths (v1.3.0+) - Direct file access:
`json
{ "path": "C:\\Users\\John\\file.pdf" }
{ "path": "/home/user/file.pdf" }
`Relative paths - Workspace files:
`json
{ "path": "docs/report.pdf" }
{ "path": "./2024/Q1.pdf" }
`Configure working directory:
`json
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/documents"
}
}
}
`
π Large PDF Strategies
Strategy 1: Page ranges
`json
{ "sources": [{ "path": "big.pdf", "pages": "1-20" }] }
`Strategy 2: Progressive loading
`json
// Step 1: Get page count
{ "sources": [{ "path": "big.pdf" }], "include_full_text": false }// Step 2: Extract sections
{ "sources": [{ "path": "big.pdf", "pages": "50-75" }] }
`Strategy 3: Parallel batching
`json
{
"sources": [
{ "path": "big.pdf", "pages": "1-50" },
{ "path": "big.pdf", "pages": "51-100" }
]
}
`---
π§ Troubleshooting
$3
Solution: Upgrade to v1.3.0+
`bash
npm update @sylphx/pdf-reader-mcp
`Restart your MCP client completely.
---
$3
Causes:
- File doesn't exist at path
- Wrong working directory
- Permission issues
Solutions:
Use absolute path:
`json
{ "path": "C:\\Full\\Path\\file.pdf" }
`Or configure
cwd:
`json
{
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/docs"
}
}
`---
$3
Solution:
`bash
npm cache clean --force
rm -rf node_modules package-lock.json
npm install @sylphx/pdf-reader-mcp@latest
`Restart MCP client completely.
---
π HTTP Transport (Remote Access)
By default, PDF Reader MCP uses stdio transport for local use. You can also run it as an HTTP server for remote access from multiple machines.
$3
`bash
Run as HTTP server on port 8080
MCP_TRANSPORT=http npx @sylphx/pdf-reader-mcp
`$3
| Variable | Default | Description |
|----------|---------|-------------|
|
MCP_TRANSPORT | stdio | Transport type: stdio or http |
| MCP_HTTP_PORT | 8080 | HTTP server port |
| MCP_HTTP_HOST | 0.0.0.0 | HTTP server hostname |
| MCP_API_KEY | - | Optional API key for authentication |$3
`dockerfile
FROM oven/bun:1
WORKDIR /app
RUN bun add @sylphx/pdf-reader-mcp
ENV MCP_TRANSPORT=http
ENV MCP_HTTP_PORT=8080
EXPOSE 8080
CMD ["bun", "node_modules/@sylphx/pdf-reader-mcp/dist/index.js"]
`$3
`json
{
"servers": {
"pdf-reader": {
"type": "http",
"url": "https://your-server.com/mcp",
"headers": {
"X-API-Key": "your-api-key"
}
}
}
}
`$3
| Endpoint | Method | Description |
|----------|--------|-------------|
|
/mcp | POST | JSON-RPC endpoint |
| /mcp/health | GET | Health check |---
ποΈ Architecture
$3
| Component | Technology |
|:----------|:-----------|
| Runtime | Node.js 22+ ESM |
| PDF Engine | PDF.js (Mozilla) |
| Validation | Zod + JSON Schema |
| Protocol | MCP SDK |
| Language | TypeScript (strict) |
| Testing | Vitest (103 tests) |
| Quality | Biome (50x faster) |
| CI/CD | GitHub Actions |
$3
- π Security First - Flexible paths with secure defaults
- π― Simple Interface - One tool, all operations
- β‘ Performance - Parallel processing, efficient memory
- π‘οΈ Reliability - Per-page isolation, detailed errors
- π§ͺ Quality - 94%+ coverage, strict TypeScript
- π Type Safety - No
any types, strict mode
- π Backward Compatible - Smooth upgrades always---
π§ͺ Development
Setup & Scripts
Prerequisites:
- Node.js >= 22.0.0
- pnpm (recommended) or npm
Setup:
`bash
git clone https://github.com/SylphxAI/pdf-reader-mcp.git
cd pdf-reader-mcp
pnpm install && pnpm build
`Scripts:
`bash
pnpm run build # Build TypeScript
pnpm run test # Run 103 tests
pnpm run test:cov # Coverage (94%+)
pnpm run check # Lint + format
pnpm run check:fix # Auto-fix
pnpm run benchmark # Performance tests
`Quality:
- β
103 tests
- β
94%+ coverage
- β
98%+ function coverage
- β
Zero lint errors
- β
Strict TypeScript
Contributing
Quick Start:
1. Fork repository
2. Create branch:
git checkout -b feature/awesome
3. Make changes: pnpm test
4. Format: pnpm run check:fix
5. Commit: Use Conventional Commits
6. Open PRCommit Format:
`
feat(images): add WebP support
fix(paths): handle UNC paths
docs(readme): update examples
``See CONTRIBUTING.md
---
- π Full Docs - Complete guides
- π Getting Started - Quick start
- π API Reference - Detailed API
- ποΈ Design - Architecture
- β‘ Performance - Benchmarks
- π Comparison - vs. alternatives
---
β
Completed
- [x] Image extraction (v1.1.0)
- [x] 5-10x parallel speedup (v1.1.0)
- [x] Y-coordinate ordering (v1.2.0)
- [x] Absolute paths (v1.3.0)
- [x] 94%+ test coverage (v1.3.0)
π Next
- [ ] OCR for scanned PDFs
- [ ] Annotation extraction
- [ ] Form field extraction
- [ ] Table detection
- [ ] 100+ MB streaming
- [ ] Advanced caching
- [ ] PDF generation
Vote at Discussions
---
Featured on:
- Smithery - MCP directory
- Glama - AI marketplace
- MseeP.ai - Security validated
Trusted worldwide β’ Enterprise adoption β’ Battle-tested
---


- π Bug Reports
- π¬ Discussions
- π Documentation
- π§ Email
Show Your Support:
β Star β’ π Watch β’ π Report bugs β’ π‘ Suggest features β’ π Contribute
---
!Stars
!Forks
!Downloads
!Contributors
103 Tests β’ 94%+ Coverage β’ Production Ready
---
MIT Β© Sylphx
---
Built with:
- PDF.js - Mozilla PDF engine
- Bun - Fast JavaScript runtime
Special thanks to the open source community β€οΈ
This project uses the following @sylphx packages:
- @sylphx/mcp-server-sdk - MCP server framework
- @sylphx/vex - Schema validation
- @sylphx/biome-config - Biome configuration
- @sylphx/tsconfig - TypeScript configuration
- @sylphx/bump - Version management
- @sylphx/doctor - Project health checker
---

---