Python-powered document processing MCP for Excel, Word, PDF

Python-powered document processing MCP with MCP Apps — Process Excel, Word, PDF, PowerPoint documents with ease using Python, and view them beautifully through an interactive MCP App.
- Excel: Read/write .xlsx files with sheet support and pagination
- Word: Read/write .docx files with paragraph and table support
- PDF: Read .pdf files with text extraction and pagination
- PowerPoint: Read .pptx files with slide content extraction
- Text Files: Read/write .txt, .csv, .md, .json, .yaml, .yml with pagination support
- Run Python: Execute Python code for flexible file operations and data processing
- MCP App: Beautiful React + Tailwind CSS app for viewing all document types
- Flexible Reading Modes: Raw full read or paginated for large files
- Powered by Pyodide: Runs in secure WebAssembly sandbox via code-runner-mcp
Add to your MCP client configuration (e.g., Claude Desktop, Cline, etc.):
Via npx (recommended):
``json`
{
"mcpServers": {
"docsmith": {
"command": "npx",
"args": ["-y", "docsmith-mcp"],
"env": {
"DOC_PAGE_SIZE": "100"
}
}
}
}
Via global installation:
`bash`
npm install -g docsmith-mcp
`json`
{
"mcpServers": {
"docsmith": {
"command": "docsmith-mcp",
"env": {
"DOC_PAGE_SIZE": "100"
}
}
}
}
Via local path:
`json`
{
"mcpServers": {
"docsmith": {
"command": "node",
"args": ["/path/to/docsmith-mcp/dist/index.js"]
}
}
}
Then use the read_document tool:
`json`
{
"file_path": "/path/to/document.xlsx",
"mode": "paginated",
"page": 1,
"page_size": 50
}
The MCP App will automatically open to display the document content beautifully.
| Format | Extensions | Read | Write | Notes |
|--------|-----------|------|-------|-------|
| Excel | .xlsx | ✅ | ✅ | Multi-sheet support, pagination |.docx
| Word | | ✅ | ✅ | Paragraphs and tables |.pdf
| PDF | | ✅ | ❌ | Text extraction with pagination |.pptx
| PowerPoint | | ✅ | ❌ | Slide content extraction |.csv
| CSV | | ✅ | ✅ | - |.txt
| Text | , .md | ✅ | ✅ | Pagination support |.json
| JSON | | ✅ | ✅ | - |.yaml
| YAML | , .yml | ✅ | ✅ | - |
Read document content with automatic format detection.
Parameters:
- file_path (string, required): Path to the documentmode
- (string, optional): "paginated" or "raw" (default: "paginated")page
- (number, optional): Page number for paginated mode (default: 1)page_size
- (number, optional): Items per page (default: 100)sheet_name
- (string, optional): Sheet name for Excel files
Example:
`json`
{
"file_path": "/path/to/document.xlsx",
"mode": "paginated",
"page": 1,
"page_size": 50,
"sheet_name": "Sheet1"
}
Write document content.
Parameters:
- file_path (string, required): Output pathformat
- (string, required): "excel", "word", "csv", "txt", "json", "yaml"data
- (array/object, required): Document content
Example:
`json`
{
"file_path": "/path/to/output.xlsx",
"format": "excel",
"data": [
["Product", "Q1", "Q2"],
["Laptop", 100, 150],
["Mouse", 500, 600]
]
}
Get document metadata without reading full content.
Parameters:
- file_path (string, required): Path to the document
Example:
`json`
{
"file_path": "/path/to/document.pdf"
}
Execute Python code for flexible file operations, data processing, and custom tasks. Supports any file format and Python libraries.
Parameters:
- code (string, required): Python code to executepackages
- (object, optional): Package mappings (import_name -> pypi_name) for required dependenciesfile_paths
- (array, optional): File paths that the code needs to access
Examples:
Read and process any file:
`json`
{
"code": "import json\nwith open('/path/to/file.json') as f:\n data = json.load(f)\n result = len(data)\n print(json.dumps({'count': result}))",
"file_paths": ["/path/to/file.json"]
}
Batch rename files with regex:
`json`
{
"code": "import os, re\nfolder = '/path/to/files'\nfor name in os.listdir(folder):\n new_name = re.sub(r'old_', 'new_', name)\n os.rename(os.path.join(folder, name), os.path.join(folder, new_name))\nprint(json.dumps({'success': True}))",
"file_paths": ["/path/to/files"]
}
Process data with pandas:
`json`
{
"code": "import pandas as pd\ndf = pd.read_csv('/path/to/data.csv')\nsummary = df.describe().to_dict()\nprint(json.dumps(summary))",
"packages": {"pandas": "pandas"},
"file_paths": ["/path/to/data.csv"]
}
Extract archive files:
`json`
{
"code": "import zipfile, os\nwith zipfile.ZipFile('/path/to/archive.zip', 'r') as z:\n z.extractall('/path/to/output')\nfiles = os.listdir('/path/to/output')\nprint(json.dumps({'extracted_files': files}))",
"file_paths": ["/path/to/archive.zip", "/path/to/output"]
}
The built-in MCP App provides a beautiful, interactive interface for viewing documents:
- Excel: Interactive tables with sticky headers
- PDF: Page-by-page text viewing
- Word: Paragraph and table rendering
- PowerPoint: Slide navigation
Built with React 19, Tailwind CSS v4, and Lucide icons.
Environment variables for customizing behavior:
| Variable | Description | Default |
|----------|-------------|---------|
| DOC_RAW_FULL_READ | Enable full raw read mode | false |DOC_PAGE_SIZE
| | Default items per page | 100 |DOC_MAX_FILE_SIZE
| | Max file size in MB | 50` |
See CONTRIBUTING.md for development setup and contribution guidelines.
MIT