MCP Server for batch processing large JSON arrays in manageable batches with progress tracking
npm install json-batch-processor-mcpA Model Context Protocol (MCP) server that enables AI IDEs to process large JSON arrays in manageable batches. This tool solves the context window limitation problem by breaking down large datasets into smaller chunks that can be processed incrementally.
- JSON File Binding: Bind JSON files for batch processing with automatic structure analysis
- Flexible Task Generation: Create batch processing tasks with customizable batch sizes and field paths
- Batch Reading: Read specific batches of data from large JSON arrays
- Progress Tracking: Persistent progress tracking that survives system restarts
- Result Merging: Automatically merge processed batches back into a complete JSON file
- MCP Integration: Seamless integration with AI IDEs through the Model Context Protocol
- Node.js v18 or higher
- npm or yarn package manager
``bash`
npm install -g json-batch-processor-mcp
`bash`
git clone
cd json-batch-processor-mcp
npm install
npm run build
npm link
Add the following configuration to your MCP settings file:
- Workspace level: .kiro/settings/mcp.json (project-specific)~/.kiro/settings/mcp.json
- User level: (global, all projects)
#### Option 1: Development/Local Installation
If you cloned the repository or installed from source:
`json`
{
"mcpServers": {
"json-batch-processor": {
"command": "node",
"args": ["/absolute/path/to/json-batch-processor-mcp/dist/index.js"],
"disabled": false,
"autoApprove": []
}
}
}
Replace /absolute/path/to/json-batch-processor-mcp with the actual path to your installation.
#### Option 2: Global npm Installation
If you installed globally via npm install -g:
`json`
{
"mcpServers": {
"json-batch-processor": {
"command": "json-batch-processor",
"args": [],
"disabled": false,
"autoApprove": []
}
}
}
#### Configuration Options
- command: The executable command or path to the server
- args: Command-line arguments (empty for global install)
- disabled: Set to true to disable the server without removing the configuration["bind_json_file", "read_batch"]
- autoApprove: List of tool names to auto-approve (e.g., )
#### Restart MCP Server
After updating the configuration:
1. Open the MCP Server view in your IDE
2. Click "Reconnect" next to the json-batch-processor server
3. Or restart your IDE
The server stores all data in:
``
~/.kiro/mcp-data/json-batch-processor/
Each binding creates a subdirectory with:
- binding.json - Binding metadatatasks.json
- - Task listprogress.json
- - Progress trackingresults/
- - Processed batch results
Bind a JSON file for batch processing.
Parameters:
- filePath (string, required): Absolute path to the JSON file
Returns:
`json`
{
"bindingId": "uuid-string",
"structure": {
"type": "object",
"arrayPaths": ["$.data", "$.items"],
"totalElements": 1000
}
}
Generate a list of batch processing tasks.
Parameters:
- bindingId (string, required): The binding ID from bind_json_filebatchSize
- (number, required): Number of elements per batchfieldPath
- (string, optional): JSONPath to the array (e.g., "$.data.items")
Returns:
`json`
{
"bindingId": "uuid-string",
"totalTasks": 20,
"batchSize": 50,
"fieldPath": "$.data",
"tasks": [
{
"id": "uuid-batch-0",
"batchIndex": 0,
"startIndex": 0,
"endIndex": 49,
"status": "pending"
}
]
}
Read data for a specific batch.
Parameters:
- bindingId (string, required): The binding IDtaskId
- (string, required): The task ID from the task list
Returns:
`json`
{
"taskId": "uuid-batch-0",
"batchIndex": 0,
"data": [...],
"startIndex": 0,
"endIndex": 49,
"totalElements": 1000
}
Update the status of a task and optionally save processed results.
Parameters:
- bindingId (string, required): The binding IDtaskId
- (string, required): The task IDstatus
- (string, required): Either "pending" or "completed"result
- (array, optional): The processed batch data
Returns:
`json`
{
"success": true,
"taskId": "uuid-batch-0",
"status": "completed"
}
Get the current processing progress.
Parameters:
- bindingId (string, required): The binding ID
Returns:
`json`
{
"bindingId": "uuid-string",
"totalTasks": 20,
"completedTasks": 5,
"pendingTasks": 15,
"percentage": 25,
"tasks": [
{
"taskId": "uuid-batch-0",
"status": "completed",
"completedAt": "2025-11-08T10:05:00Z",
"hasResult": true
}
]
}
Merge all processed batches into a final JSON file.
Parameters:
- bindingId (string, required): The binding IDoutputPath
- (string, required): Path for the output JSON file
Returns:
`json`
{
"success": true,
"outputPath": "/path/to/output.json",
"totalElements": 1000,
"message": "Successfully merged 20 batches"
}
Here's a minimal example to get started:
`javascript
// 1. Bind a JSON file
const binding = await callTool("bind_json_file", {
filePath: "/path/to/data.json"
});
// 2. Generate tasks (50 items per batch)
const tasks = await callTool("generate_task_list", {
bindingId: binding.bindingId,
batchSize: 50,
fieldPath: "$.items" // Optional: specify array path
});
// 3. Process each batch
for (const task of tasks.tasks) {
// Read batch
const batch = await callTool("read_batch", {
bindingId: binding.bindingId,
taskId: task.id
});
// Process data (your custom logic)
const processed = batch.data.map(item => ({
...item,
processed: true
}));
// Save results
await callTool("update_task_status", {
bindingId: binding.bindingId,
taskId: task.id,
status: "completed",
result: processed
});
}
// 4. Merge all results
const result = await callTool("merge_results", {
bindingId: binding.bindingId,
outputPath: "/path/to/output.json"
});
`
For a complete walkthrough with detailed examples, see EXAMPLE.md.
The server returns structured error responses:
`json`
{
"error": {
"code": "FileNotFoundError",
"message": "JSON file not found at path: /path/to/file.json",
"details": {}
}
}
Common error codes:
- FileNotFoundError: JSON file doesn't existInvalidJSONError
- : Invalid JSON formatBindingNotFoundError
- : Binding ID not foundTaskNotFoundError
- : Task ID not foundInvalidBatchIndexError
- : Batch index out of rangeIncompleteTasksError
- : Not all tasks completedInvalidFieldPathError
- : Invalid JSONPath or path doesn't point to an arrayStorageError
- : File system operation failed
- Binding operation: < 1 second (10MB file)
- Task generation: < 500ms (1000 tasks)
- Batch reading: < 100ms (50 element batch)
- Progress update: < 50ms
- Result merging: < 5 seconds (1000 batches)
- Maximum file size: 100MB (configurable)
- Maximum storage per binding: 500MB (configurable)
- Supports JSON format only (CSV, XML support planned)
- Requires Node.js runtime (not browser-compatible)
- Data Enrichment: Add information from external APIs to large datasets
- Data Transformation: Convert data structures in manageable chunks
- Data Validation: Validate large datasets batch by batch
- Data Migration: Transform and migrate data between formats
- AI Processing: Process large datasets within AI context window limits
- Data Analysis: Analyze large datasets incrementally
The server consists of six core components:
1. Binding Manager: Validates and binds JSON files, scans structure
2. Task Manager: Generates and manages batch processing tasks
3. Batch Reader: Extracts specific batches from JSON arrays
4. Progress Tracker: Tracks and persists task completion status
5. Result Merger: Merges processed batches back into complete JSON
6. MCP Server: Exposes functionality through MCP protocol tools
Data is stored in ~/.kiro/mcp-data/json-batch-processor/{bindingId}/:binding.json
- - File binding metadatatasks.json
- - Complete task listprogress.json
- - Current progress stateresults/
- - Individual batch results
`bash`
npm run build
`bash`
npm run dev
`bash`
npm run build:clean
``
json-batch-processor-mcp/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── managers/ # Core business logic
│ │ ├── BindingManager.ts
│ │ ├── TaskManager.ts
│ │ ├── BatchReader.ts
│ │ ├── ProgressTracker.ts
│ │ └── ResultMerger.ts
│ ├── types/ # TypeScript type definitions
│ │ └── index.ts
│ └── utils/ # Utility functions
│ ├── storage.ts
│ └── errors.ts
├── examples/ # Sample JSON files
├── dist/ # Compiled JavaScript
└── package.json
MIT
Contributions are welcome! Please open an issue or submit a pull request.
Fixed a critical bug where binding information was not persisted to disk, causing read_batch and subsequent operations to fail. The server now correctly saves binding.json files, ensuring reliable operation across server restarts.
What changed:
- Binding information (including JSON content) is now saved to disk
- read_batch operations work reliably even after server restart
- All batch processing workflows are now resumable
Migration: If you have existing bindings from before this fix, please re-bind your JSON files using bind_json_file.
Problem: MCP server doesn't appear in your IDE
Solutions:
1. Verify the configuration path in your mcp.json is correct
2. Check that the server is not disabled ("disabled": false)
3. Restart your IDE or reconnect the MCP server
4. Check the MCP server logs for errors
Problem: FileNotFoundError when binding
Solutions:
1. Use absolute paths, not relative paths
2. Verify the file exists: ls -la /path/to/file.json
3. Check file permissions (must be readable)
4. Ensure no typos in the file path
Problem: InvalidJSONError when binding
Solutions:
1. Validate your JSON: cat file.json | jq .
2. Check for trailing commas (not allowed in JSON)
3. Ensure proper quote escaping
4. Verify UTF-8 encoding
Problem: BindingNotFoundError on subsequent operations
Solutions:
1. Verify you're using the correct binding ID from the bind response
2. Check if the binding data still exists in ~/.kiro/mcp-data/json-batch-processor/
3. Re-bind the file if necessary
Problem: IncompleteTasksError when merging
Solutions:
1. Call get_progress to see which tasks are pending
2. Complete all pending tasks before merging
3. Check for failed tasks and retry them
Q: Can I process multiple JSON files simultaneously?
A: Yes! Each binding has a unique ID, so you can bind multiple files and process them in parallel.
Q: What happens if my process crashes?
A: All progress is saved to disk. Simply call get_progress to see where you left off and continue.
Q: Can I change the batch size after generating tasks?
A: No, you need to unbind and create a new binding with a different batch size.
Q: Does this work with nested arrays?
A: Yes! Use JSONPath syntax to specify the exact array path (e.g., $.data.users[*].orders).
Q: Can I process arrays in parallel?
A: Yes, tasks are independent. You can process multiple batches simultaneously if your logic allows.
Q: What if my JSON has multiple arrays?
A: Specify the exact array using the fieldPath parameter. If omitted, the first array is used.
Q: Can I modify the original file?
A: No, the original file is never modified. Results are always written to a new output file.
Q: How do I clean up old bindings?
A: Delete the binding directory: rm -rf ~/.kiro/mcp-data/json-batch-processor/{bindingId}`
For issues and questions, please open an issue on the GitHub repository.