JSONL Explorer MCP

![CI](https://github.com/YOUR_USERNAME/jsonl-explorer-mcp/actions/workflows/ci.yml)
![Node.js](https://nodejs.org/)
![License: MIT](https://opensource.org/licenses/MIT)

A Model Context Protocol (MCP) server for analyzing JSONL (JSON Lines) files. Designed for local development workflows with files ranging from 1MB to 1GB.

Why JSONL Explorer?

Working with large JSONL files in development can be challenging:

- Log files grow too large to open in editors
- Data exports need exploration before processing
- Event streams require real-time monitoring
- Schema drift happens silently across records

JSONL Explorer solves these problems by providing streaming analysis tools that work efficiently with large files while integrating seamlessly with AI assistants via MCP.

Features

| Feature | Description |
|---------|-------------|
| Streaming Architecture | Process files of any size without loading into memory |
| Schema Inference | Automatically detect and track schema across records |
| Statistical Analysis | Field-level stats including distributions, percentiles, cardinality |
| Flexible Querying | Simple comparisons, regex, JSONPath, and compound queries |
| JSON Schema Validation | Validate syntax and structure against schemas |
| Live File Tailing | Monitor actively-written files with cursor-based tracking |
| File Comparison | Diff two JSONL files by key field |

Quick Start

$3

``bash npm install -g jsonl-explorer-mcp`

Or run directly with npx:

`bash npx jsonl-explorer-mcp`

`$3`

#### Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json (Linux/macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

`json { "mcpServers": { "jsonl-explorer": { "command": "npx", "args": ["jsonl-explorer-mcp"] } } }`

#### Claude Code

Add to your project's .mcp.json:

`json { "mcpServers": { "jsonl-explorer": { "command": "npx", "args": ["jsonl-explorer-mcp"] } } }`

`$3`

Stdio Mode (default) - For MCP clients that communicate via stdin/stdout:

`bash jsonl-explorer-mcp`

HTTP Mode - For web-based integrations:

`bash jsonl-explorer-mcp --http --port=3000`

`Tools Reference`

`$3`

Get a comprehensive overview of a JSONL file including size, record count, inferred schema, and field statistics.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |sampleSize | number | 100 | Records to sample for schema inference |

Example Response:`json { "file": "/data/events.jsonl", "size": "156.2 MB", "lineCount": 1248392, "validRecords": 1248392, "malformedLines": 0, "schema": { "type": "object", "fields": [ { "name": "id", "types": ["string"], "nullable": false }, { "name": "timestamp", "types": ["string"], "nullable": false }, { "name": "event_type", "types": ["string"], "nullable": false }, { "name": "payload", "types": ["object"], "nullable": true } ] } }`

---

`$3`

Retrieve sample records using various sampling strategies.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |count| number | 10 | Number of records to sample | |mode | string | "first" | Sampling mode: first, last, random, range| |rangeStart| number | - | Start line for range mode (1-indexed) | |rangeEnd | number | - | End line for range mode (1-indexed) |

Sampling Modes: -first- First N records (fast, streaming) -last- Last N records (requires file scan) -random- Random sample using reservoir sampling -range - Specific line range

---

`$3`

Infer the schema of records by sampling.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |sampleSize| number | 1000 | Records to sample | |outputFormat | string | "inferred" | Format: inferred, json-schema, formatted |

Output Formats: -inferred- Internal schema representation with type frequencies -json-schema- Standard JSON Schema (draft-07) -formatted - Human-readable summary

---

`$3`

Collect aggregate statistics for fields.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |fields| string[] | all | Specific fields to analyze | |maxRecords | number | all | Maximum records to analyze |

Statistics Provided: - Numeric fields: min, max, mean, median, stdDev, percentiles (p50, p90, p95, p99) - String fields: minLength, maxLength, avgLength, cardinality, value distribution - Boolean fields: true/false counts and percentages - All fields: null count, unique count

---

`$3`

Search for records where a field matches a regex pattern.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |field| string | required | Field path (supports dot notation) | |pattern| string | required | Regex pattern to match | |caseSensitive| boolean | false | Case-sensitive matching | |maxResults| number | 100 | Maximum results to return | |returnFields | string[] | all | Fields to include in results |

Example:`json { "file": "/data/logs.jsonl", "field": "message", "pattern": "error|failed|exception", "caseSensitive": false, "maxResults": 50 }`

---

`$3`

Filter records using powerful query expressions.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |query| string | required | Query expression | |outputFormat | string | "records" | Output: records, count, lines| |limit | number | 1000 | Maximum results |

Query Syntax:

| Type | Example | Description | |------|---------|-------------| | Equality |status == "active"| Exact match | | Comparison |age > 30 | Numeric comparison (>, >=, <, <=, !=) | | Regex |email =~ "@gmail\\.com$"| Pattern matching | | Null check |deleted_at == null| Check for null values | | JSONPath |$[?(@.price < 100)]| Full JSONPath expressions | | Compound |status == "active" AND age > 30 | Combine with AND/OR |

Examples:`javascript // Find active premium users "subscription == \"premium\" AND active == true"

// Find orders over $100 "total > 100"

// Find emails from specific domain "email =~ \"@company\\.com$\""

// Complex JSONPath "$[?(@.items[*].quantity > 10)]"`

---

`$3`

Validate file syntax and optionally against a JSON Schema.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |schema| object/string | - | JSON Schema (inline or file path) | |stopOnFirstError| boolean | false | Stop on first error | |maxErrors | number | 100 | Maximum errors to report |

Response:`json { "valid": false, "totalRecords": 10000, "validRecords": 9987, "invalidRecords": 13, "errors": [ { "line": 1523, "error": "must have required property 'user_id'", "path": "/user_id" } ] }`

---

`$3`

Monitor actively-written files for new records using cursor-based tracking.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |cursor| number | 0 | Byte position to start from | |maxRecords| number | 100 | Maximum records to return | |timeout | number | 0 | Wait time for new content (ms) |

Usage Pattern:`javascript // Initial call - start from beginning { "file": "/var/log/app.jsonl", "cursor": 0 } // Response: { records: [...], newCursor: 15234, hasMore: false }

// Subsequent calls - continue from cursor { "file": "/var/log/app.jsonl", "cursor": 15234, "timeout": 5000 } // Waits up to 5s for new content`

---

`$3`

Compare two JSONL files and report differences.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file1| string | required | Path to first file | |file2| string | required | Path to second file | |keyField| string | - | Field to use as unique key for matching | |compareFields| string[] | all | Specific fields to compare | |maxDiffs | number | 100 | Maximum differences to report |

Diff Types: -added- Record exists only in file2 -removed- Record exists only in file1 -modified - Record exists in both but differs

`Use Cases`

`$3`

`"Inspect the application log file at /var/log/app.jsonl and show me the schema and any error messages from the last hour"`

`$3`

`"Validate /data/export.jsonl against this schema and show me statistics on the user_id field to check for duplicates"`

`$3`

`"Tail the events file and alert me when you see any records with event_type containing 'error'"`

`$3`

`"Diff these two data exports using 'id' as the key field and show me what changed"`

`Architecture`

See ARCHITECTURE.md for detailed technical documentation including:

- Streaming parser design - Schema inference algorithm - Statistics collection with Welford's algorithm - Query engine implementation - Memory efficiency strategies

`Development`

`$3`

- Node.js >= 18 - npm >= 9

`$3`

`bash

`Clone the repository`


git clone https://github.com/YOUR_USERNAME/jsonl-explorer-mcp.git
cd jsonl-explorer-mcp
Install dependencies

npm install
Build

npm run build
Run tests

npm test

$3

| Command | Description | |---------|-------------| |npm run build| Compile TypeScript to JavaScript | |npm run dev| Run with auto-reload (development) | |npm run start| Run compiled server (stdio mode) | |npm run start:http| Run compiled server (HTTP mode) | |npm test| Run test suite in watch mode | |npm run test:run| Run tests once | |npm run typecheck | Type-check without emitting |

`$3`

`src/ ├── index.ts # Entry point, transport setup ├── server.ts # MCP server configuration ├── core/ # Core processing modules │ ├── streaming-parser.ts # Line-by-line JSONL processing │ ├── schema-inferrer.ts # Schema detection │ ├── statistics.ts # Stats collection │ ├── query-engine.ts # Query parsing/execution │ └── file-tailer.ts # Cursor-based tailing ├── tools/ # MCP tool implementations │ ├── inspect.ts │ ├── sample.ts │ ├── schema.ts │ ├── stats.ts │ ├── search.ts │ ├── filter.ts │ ├── validate.ts │ ├── tail.ts │ └── diff.ts └── utils/ # Shared utilities ├── format.ts ├── file-info.ts └── types.ts`

`Performance`

Designed for efficiency with large files:

| File Size | Records | Inspect Time | Memory | |-----------|---------|--------------|--------| | 10 MB | 50,000 | ~0.5s | ~20 MB | | 100 MB | 500,000 | ~3s | ~25 MB | | 1 GB | 5,000,000 | ~25s | ~30 MB |

Memory usage stays constant regardless of file size due to streaming architecture.

`Contributing`

Contributions are welcome! Please:

1. Fork the repository 2. Create a feature branch (git checkout -b feature/amazing-feature) 3. Commit your changes (git commit -m 'Add amazing feature') 4. Push to the branch (git push origin feature/amazing-feature`)
5. Open a Pull Request

License

MIT - see LICENSE for details.

Related Projects

- Model Context Protocol - The protocol this server implements
- MCP Servers - Official MCP server implementations

JSONL Explorer MCP

![CI](https://github.com/YOUR_USERNAME/jsonl-explorer-mcp/actions/workflows/ci.yml)
![Node.js](https://nodejs.org/)
![License: MIT](https://opensource.org/licenses/MIT)

A Model Context Protocol (MCP) server for analyzing JSONL (JSON Lines) files. Designed for local development workflows with files ranging from 1MB to 1GB.

Why JSONL Explorer?

Working with large JSONL files in development can be challenging:

JSONL Explorer solves these problems by providing streaming analysis tools that work efficiently with large files while integrating seamlessly with AI assistants via MCP.

Features

Quick Start

$3

``bash npm install -g jsonl-explorer-mcp`

Or run directly with npx:

`bash npx jsonl-explorer-mcp`

`$3`

#### Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json (Linux/macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

`json { "mcpServers": { "jsonl-explorer": { "command": "npx", "args": ["jsonl-explorer-mcp"] } } }`

#### Claude Code

Add to your project's .mcp.json:

`json { "mcpServers": { "jsonl-explorer": { "command": "npx", "args": ["jsonl-explorer-mcp"] } } }`

`$3`

Stdio Mode (default) - For MCP clients that communicate via stdin/stdout:

`bash jsonl-explorer-mcp`

HTTP Mode - For web-based integrations:

`bash jsonl-explorer-mcp --http --port=3000`

`Tools Reference`

`$3`

Get a comprehensive overview of a JSONL file including size, record count, inferred schema, and field statistics.

---

`$3`

Retrieve sample records using various sampling strategies.

Sampling Modes: -first- First N records (fast, streaming) -last- Last N records (requires file scan) -random- Random sample using reservoir sampling -range - Specific line range

---

`$3`

Infer the schema of records by sampling.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| |file| string | required | Absolute path to the JSONL file | |sampleSize| number | 1000 | Records to sample | |outputFormat | string | "inferred" | Format: inferred, json-schema, formatted |

Output Formats: -inferred- Internal schema representation with type frequencies -json-schema- Standard JSON Schema (draft-07) -formatted - Human-readable summary

---

`$3`

Collect aggregate statistics for fields.

---

`$3`

Search for records where a field matches a regex pattern.

Example:`json { "file": "/data/logs.jsonl", "field": "message", "pattern": "error|failed|exception", "caseSensitive": false, "maxResults": 50 }`

---

`$3`

Filter records using powerful query expressions.

Query Syntax:

Examples:`javascript // Find active premium users "subscription == \"premium\" AND active == true"

// Find orders over $100 "total > 100"

// Find emails from specific domain "email =~ \"@company\\.com$\""

// Complex JSONPath "$[?(@.items[*].quantity > 10)]"`

---

`$3`

Validate file syntax and optionally against a JSON Schema.

---

`$3`

Monitor actively-written files for new records using cursor-based tracking.

Usage Pattern:`javascript // Initial call - start from beginning { "file": "/var/log/app.jsonl", "cursor": 0 } // Response: { records: [...], newCursor: 15234, hasMore: false }

// Subsequent calls - continue from cursor { "file": "/var/log/app.jsonl", "cursor": 15234, "timeout": 5000 } // Waits up to 5s for new content`

---

`$3`

Compare two JSONL files and report differences.

Diff Types: -added- Record exists only in file2 -removed- Record exists only in file1 -modified - Record exists in both but differs

`Use Cases`

`$3`

`"Inspect the application log file at /var/log/app.jsonl and show me the schema and any error messages from the last hour"`

`$3`

`"Validate /data/export.jsonl against this schema and show me statistics on the user_id field to check for duplicates"`

`$3`

`"Tail the events file and alert me when you see any records with event_type containing 'error'"`

`$3`

`"Diff these two data exports using 'id' as the key field and show me what changed"`

`Architecture`

See ARCHITECTURE.md for detailed technical documentation including:

- Streaming parser design - Schema inference algorithm - Statistics collection with Welford's algorithm - Query engine implementation - Memory efficiency strategies

`Development`

`$3`

- Node.js >= 18 - npm >= 9

`$3`

`bash

`Clone the repository`


git clone https://github.com/YOUR_USERNAME/jsonl-explorer-mcp.git
cd jsonl-explorer-mcp
Install dependencies

npm install
Build

npm run build
Run tests

npm test

$3

`$3`

`Performance`

Designed for efficiency with large files:

Memory usage stays constant regardless of file size due to streaming architecture.

`Contributing`

Contributions are welcome! Please:

License

MIT - see LICENSE for details.

Related Projects

- Model Context Protocol - The protocol this server implements
- MCP Servers - Official MCP server implementations