MCP server for analyzing JSONL files with streaming, statistics, search, validation, and live tailing
npm install jsonl-explorer-mcp


A Model Context Protocol (MCP) server for analyzing JSONL (JSON Lines) files. Designed for local development workflows with files ranging from 1MB to 1GB.
Working with large JSONL files in development can be challenging:
- Log files grow too large to open in editors
- Data exports need exploration before processing
- Event streams require real-time monitoring
- Schema drift happens silently across records
JSONL Explorer solves these problems by providing streaming analysis tools that work efficiently with large files while integrating seamlessly with AI assistants via MCP.
| Feature | Description |
|---------|-------------|
| Streaming Architecture | Process files of any size without loading into memory |
| Schema Inference | Automatically detect and track schema across records |
| Statistical Analysis | Field-level stats including distributions, percentiles, cardinality |
| Flexible Querying | Simple comparisons, regex, JSONPath, and compound queries |
| JSON Schema Validation | Validate syntax and structure against schemas |
| Live File Tailing | Monitor actively-written files with cursor-based tracking |
| File Comparison | Diff two JSONL files by key field |
``bash`
npm install -g jsonl-explorer-mcp
Or run directly with npx:
`bash`
npx jsonl-explorer-mcp
#### Claude Desktop
Add to ~/.config/claude/claude_desktop_config.json (Linux/macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
`json`
{
"mcpServers": {
"jsonl-explorer": {
"command": "npx",
"args": ["jsonl-explorer-mcp"]
}
}
}
#### Claude Code
Add to your project's .mcp.json:
`json`
{
"mcpServers": {
"jsonl-explorer": {
"command": "npx",
"args": ["jsonl-explorer-mcp"]
}
}
}
Stdio Mode (default) - For MCP clients that communicate via stdin/stdout:
`bash`
jsonl-explorer-mcp
HTTP Mode - For web-based integrations:
`bash`
jsonl-explorer-mcp --http --port=3000
Get a comprehensive overview of a JSONL file including size, record count, inferred schema, and field statistics.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |sampleSize
| | number | 100 | Records to sample for schema inference |
Example Response:
`json`
{
"file": "/data/events.jsonl",
"size": "156.2 MB",
"lineCount": 1248392,
"validRecords": 1248392,
"malformedLines": 0,
"schema": {
"type": "object",
"fields": [
{ "name": "id", "types": ["string"], "nullable": false },
{ "name": "timestamp", "types": ["string"], "nullable": false },
{ "name": "event_type", "types": ["string"], "nullable": false },
{ "name": "payload", "types": ["object"], "nullable": true }
]
}
}
---
Retrieve sample records using various sampling strategies.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |count
| | number | 10 | Number of records to sample |mode
| | string | "first" | Sampling mode: first, last, random, range |rangeStart
| | number | - | Start line for range mode (1-indexed) |rangeEnd
| | number | - | End line for range mode (1-indexed) |
Sampling Modes:
- first - First N records (fast, streaming)last
- - Last N records (requires file scan)random
- - Random sample using reservoir samplingrange
- - Specific line range
---
Infer the schema of records by sampling.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |sampleSize
| | number | 1000 | Records to sample |outputFormat
| | string | "inferred" | Format: inferred, json-schema, formatted |
Output Formats:
- inferred - Internal schema representation with type frequenciesjson-schema
- - Standard JSON Schema (draft-07)formatted
- - Human-readable summary
---
Collect aggregate statistics for fields.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |fields
| | string[] | all | Specific fields to analyze |maxRecords
| | number | all | Maximum records to analyze |
Statistics Provided:
- Numeric fields: min, max, mean, median, stdDev, percentiles (p50, p90, p95, p99)
- String fields: minLength, maxLength, avgLength, cardinality, value distribution
- Boolean fields: true/false counts and percentages
- All fields: null count, unique count
---
Search for records where a field matches a regex pattern.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |field
| | string | required | Field path (supports dot notation) |pattern
| | string | required | Regex pattern to match |caseSensitive
| | boolean | false | Case-sensitive matching |maxResults
| | number | 100 | Maximum results to return |returnFields
| | string[] | all | Fields to include in results |
Example:
`json`
{
"file": "/data/logs.jsonl",
"field": "message",
"pattern": "error|failed|exception",
"caseSensitive": false,
"maxResults": 50
}
---
Filter records using powerful query expressions.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |query
| | string | required | Query expression |outputFormat
| | string | "records" | Output: records, count, lines |limit
| | number | 1000 | Maximum results |
Query Syntax:
| Type | Example | Description |
|------|---------|-------------|
| Equality | status == "active" | Exact match |age > 30
| Comparison | | Numeric comparison (>, >=, <, <=, !=) |email =~ "@gmail\\.com$"
| Regex | | Pattern matching |deleted_at == null
| Null check | | Check for null values |$[?(@.price < 100)]
| JSONPath | | Full JSONPath expressions |status == "active" AND age > 30
| Compound | | Combine with AND/OR |
Examples:
`javascript
// Find active premium users
"subscription == \"premium\" AND active == true"
// Find orders over $100
"total > 100"
// Find emails from specific domain
"email =~ \"@company\\.com$\""
// Complex JSONPath
"$[?(@.items[*].quantity > 10)]"
`
---
Validate file syntax and optionally against a JSON Schema.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |schema
| | object/string | - | JSON Schema (inline or file path) |stopOnFirstError
| | boolean | false | Stop on first error |maxErrors
| | number | 100 | Maximum errors to report |
Response:
`json`
{
"valid": false,
"totalRecords": 10000,
"validRecords": 9987,
"invalidRecords": 13,
"errors": [
{
"line": 1523,
"error": "must have required property 'user_id'",
"path": "/user_id"
}
]
}
---
Monitor actively-written files for new records using cursor-based tracking.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |cursor
| | number | 0 | Byte position to start from |maxRecords
| | number | 100 | Maximum records to return |timeout
| | number | 0 | Wait time for new content (ms) |
Usage Pattern:
`javascript
// Initial call - start from beginning
{ "file": "/var/log/app.jsonl", "cursor": 0 }
// Response: { records: [...], newCursor: 15234, hasMore: false }
// Subsequent calls - continue from cursor
{ "file": "/var/log/app.jsonl", "cursor": 15234, "timeout": 5000 }
// Waits up to 5s for new content
`
---
Compare two JSONL files and report differences.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file1 | string | required | Path to first file |file2
| | string | required | Path to second file |keyField
| | string | - | Field to use as unique key for matching |compareFields
| | string[] | all | Specific fields to compare |maxDiffs
| | number | 100 | Maximum differences to report |
Diff Types:
- added - Record exists only in file2removed
- - Record exists only in file1modified
- - Record exists in both but differs
``
"Inspect the application log file at /var/log/app.jsonl and show me
the schema and any error messages from the last hour"
``
"Validate /data/export.jsonl against this schema and show me
statistics on the user_id field to check for duplicates"
``
"Tail the events file and alert me when you see any records
with event_type containing 'error'"
``
"Diff these two data exports using 'id' as the key field
and show me what changed"
See ARCHITECTURE.md for detailed technical documentation including:
- Streaming parser design
- Schema inference algorithm
- Statistics collection with Welford's algorithm
- Query engine implementation
- Memory efficiency strategies
- Node.js >= 18
- npm >= 9
`bashClone the repository
git clone https://github.com/YOUR_USERNAME/jsonl-explorer-mcp.git
cd jsonl-explorer-mcp
$3
| Command | Description |
|---------|-------------|
|
npm run build | Compile TypeScript to JavaScript |
| npm run dev | Run with auto-reload (development) |
| npm run start | Run compiled server (stdio mode) |
| npm run start:http | Run compiled server (HTTP mode) |
| npm test | Run test suite in watch mode |
| npm run test:run | Run tests once |
| npm run typecheck | Type-check without emitting |$3
`
src/
├── index.ts # Entry point, transport setup
├── server.ts # MCP server configuration
├── core/ # Core processing modules
│ ├── streaming-parser.ts # Line-by-line JSONL processing
│ ├── schema-inferrer.ts # Schema detection
│ ├── statistics.ts # Stats collection
│ ├── query-engine.ts # Query parsing/execution
│ └── file-tailer.ts # Cursor-based tailing
├── tools/ # MCP tool implementations
│ ├── inspect.ts
│ ├── sample.ts
│ ├── schema.ts
│ ├── stats.ts
│ ├── search.ts
│ ├── filter.ts
│ ├── validate.ts
│ ├── tail.ts
│ └── diff.ts
└── utils/ # Shared utilities
├── format.ts
├── file-info.ts
└── types.ts
`Performance
Designed for efficiency with large files:
| File Size | Records | Inspect Time | Memory |
|-----------|---------|--------------|--------|
| 10 MB | 50,000 | ~0.5s | ~20 MB |
| 100 MB | 500,000 | ~3s | ~25 MB |
| 1 GB | 5,000,000 | ~25s | ~30 MB |
Memory usage stays constant regardless of file size due to streaming architecture.
Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch (
git checkout -b feature/amazing-feature)
3. Commit your changes (git commit -m 'Add amazing feature')
4. Push to the branch (git push origin feature/amazing-feature`)MIT - see LICENSE for details.
- Model Context Protocol - The protocol this server implements
- MCP Servers - Official MCP server implementations