MCP server for observability tooling - query traces, metrics, logs from local JSONL or SigNoz
npm install observability-toolkitMCP server for observability tooling - query traces, metrics, and logs from local JSONL files for Claude Code sessions. Optionally integrates with SigNoz Cloud for enhanced observability.
``bash`
claude mcp add observability-toolkit -- npx -y observability-toolkit
Or for local development:
`bash`
claude mcp add observability-toolkit -- node ~/.claude/mcp-servers/observability-toolkit/dist/server.js
| Tool | Description |
|------|-------------|
| obs_query_traces | Query traces with filtering, regex, numeric operators |obs_query_metrics
| | Query metrics with aggregations (sum, avg, p50, p95, p99, rate) |obs_query_logs
| | Query logs with boolean search, field extraction |obs_query_llm_events
| | Query LLM events with token usage and duration metrics |obs_query_evaluations
| | Query evaluation events with aggregations and groupBy |obs_query_verifications
| | Query human verification events for EU AI Act compliance |obs_health_check
| | Check telemetry system health with cache statistics |obs_context_stats
| | Get context window utilization stats |obs_get_trace_url
| | Get SigNoz trace viewer URL (requires SigNoz) |obs_setup_claudeignore
| | Add entries to .claudeignore |obs_export_langfuse
| | Export evaluations to Langfuse via OTLP HTTP |
| Variable | Description | Default |
|----------|-------------|---------|
| TELEMETRY_DIR | Local telemetry directory | ~/.claude/telemetry |SIGNOZ_URL
| | SigNoz instance URL | - |SIGNOZ_API_KEY
| | SigNoz API key | - |CACHE_TTL_MS
| | Query cache TTL in milliseconds | 60000 |RETENTION_DAYS
| | Days to retain telemetry files | 7 |LANGFUSE_ENDPOINT
| | Langfuse OTLP endpoint URL | - |LANGFUSE_PUBLIC_KEY
| | Langfuse public key | - |LANGFUSE_SECRET_KEY
| | Langfuse secret key | - |
`javascript
// Basic query
obs_query_traces({ limit: 10 })
// Filter by trace ID
obs_query_traces({ traceId: "abc123..." })
// Filter by service and duration
obs_query_traces({ serviceName: "claude-code", minDurationMs: 100 })
// Regex pattern matching
obs_query_traces({ spanNameRegex: "^http\\..*" })
// Numeric attribute filtering
obs_query_traces({
numericFilter: [
{ attribute: "http.status_code", operator: "gte", value: 400 }
]
})
// Existence checks
obs_query_traces({
attributeExists: ["error.message"],
attributeNotExists: ["http.response.body"]
})
// OTel GenAI agent/tool filters
obs_query_traces({ agentName: "Explore", toolName: "Read" })
obs_query_traces({ operationName: "execute_tool", toolCallId: "toolu_123" })
`
`javascript
// Basic severity filter
obs_query_logs({ severity: "ERROR", limit: 20 })
// Boolean search (AND)
obs_query_logs({
searchTerms: ["timeout", "connection"],
searchOperator: "AND"
})
// Boolean search (OR)
obs_query_logs({
searchTerms: ["error", "warning", "critical"],
searchOperator: "OR"
})
// Field extraction from JSON logs
obs_query_logs({
extractFields: ["user.id", "request.method", "response.status"]
})
// Exclude patterns
obs_query_logs({
search: "error",
excludeSearch: "health-check"
})
`
`javascript
// Basic query
obs_query_metrics({ metricName: "session.context.size" })
// Aggregations
obs_query_metrics({ metricName: "http.duration", aggregation: "avg" })
obs_query_metrics({ metricName: "http.duration", aggregation: "p95" })
obs_query_metrics({ metricName: "requests.count", aggregation: "rate" })
// Time bucket grouping
obs_query_metrics({
metricName: "token.usage",
aggregation: "sum",
timeBucket: "1h",
groupBy: ["model"]
})
// Percentiles
obs_query_metrics({ metricName: "latency", aggregation: "p99" })
`
`javascript
// Basic query
obs_query_llm_events({ limit: 20 })
// Filter by model and provider
obs_query_llm_events({ model: "claude-3-opus", provider: "anthropic" })
// OTel GenAI operation types
obs_query_llm_events({ operationName: "chat" })
obs_query_llm_events({ operationName: "invoke_agent" })
// Filter by conversation
obs_query_llm_events({ conversationId: "conv-abc123" })
// Combine filters
obs_query_llm_events({
operationName: "chat",
provider: "anthropic",
conversationId: "conv-abc123"
})
`
Query events from any LLM provider using OTel GenAI standard identifiers:
`javascript
// Anthropic Claude
obs_query_llm_events({ provider: "anthropic", model: "claude-3-opus" })
// OpenAI
obs_query_llm_events({ provider: "openai", model: "gpt-4o" })
// Google Gemini
obs_query_llm_events({ provider: "gcp.gemini", model: "gemini-1.5-pro" })
// Mistral AI
obs_query_llm_events({ provider: "mistral_ai", model: "mistral-large" })
// Cohere
obs_query_llm_events({ provider: "cohere", model: "command-r-plus" })
// AWS Bedrock (multi-model)
obs_query_llm_events({ provider: "aws.bedrock" })
// Azure OpenAI
obs_query_llm_events({ provider: "azure.ai.openai" })
// Local models (Ollama)
obs_query_llm_events({ provider: "ollama", model: "llama3:8b" })
// Groq
obs_query_llm_events({ provider: "groq", model: "llama-3.3-70b" })
`
Provider Fallback Chain: The toolkit uses OTel GenAI v1.39 compliant attribute lookup:
1. gen_ai.provider.name (primary)gen_ai.system
2. (legacy OTel)provider
3. (custom/fallback)
`javascript
obs_health_check({ verbose: true })
// Returns:
{
"status": "ok",
"backends": { ... },
"cache": {
"traces": { "hits": 10, "misses": 5, "hitRate": 0.67, "size": 15, "evictions": 0 },
"logs": { "hits": 8, "misses": 12, "hitRate": 0.4, "size": 20, "evictions": 2 },
"metrics": { "hits": 0, "misses": 0, "hitRate": 0, "size": 0, "evictions": 0 },
"llmEvents": { "hits": 0, "misses": 0, "hitRate": 0, "size": 0, "evictions": 0 }
}
}
`
| Feature | Description |
|---------|-------------|
| Percentile Aggregations | p50, p95, p99 for metrics |
| Time Bucket Grouping | 1m, 5m, 1h, 1d buckets for trend analysis |
| Rate Calculations | Per-second rate of change |
| Numeric Operators | gt, gte, lt, lte, eq for attribute filtering |
| Regex Patterns | Advanced span name filtering |
| Boolean Search | AND/OR operators for log queries |
| Field Extraction | Extract JSON paths from structured logs |
| Negation Filters | Exclude matching spans/logs |
| Existence Checks | Filter by attribute presence |
| Feature | Description |
|---------|-------------|
| severityNumber | Standard OTel severity levels |
| statusCode | UNSET, OK, ERROR for spans |
| Histogram Buckets | Full histogram distribution support |
| InstrumentationScope | Library/module metadata |
| Span Links | Cross-trace relationships |
| Exemplars | Metric-to-trace correlation |
| Aggregation Temporality | DELTA, CUMULATIVE support |
| Feature | Description |
|---------|-------------|
| gen_ai.operation.name | Filter by chat, embeddings, invoke_agent, execute_tool |gen_ai.provider.name
| | Provider fallback: gen_ai.provider.name → gen_ai.system → provider |gen_ai.conversation.id
| | Filter LLM events by conversation ID |gen_ai.agent.id/name
| | Filter traces by agent attributes |gen_ai.tool.name/call.id
| | Filter traces by tool attributes |gen_ai.response.model
| | Actual model that responded |gen_ai.response.finish_reasons
| | Why generation stopped |gen_ai.request.temperature
| | Sampling temperature |gen_ai.request.max_tokens
| | Maximum output tokens |
| Percentiles | p50, p95, p99, rate aggregations |
| Provider ID | Description | Example Models |
|-------------|-------------|----------------|
| anthropic | Anthropic Claude | claude-3-opus, claude-3-sonnet, claude-3-haiku |openai
| | OpenAI GPT | gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini |gcp.gemini
| | Google AI Studio | gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash |gcp.vertex_ai
| | Google Vertex AI | gemini-pro, claude-3-opus (via Vertex) |aws.bedrock
| | AWS Bedrock | claude-3-sonnet, titan-text, llama-3 |azure.ai.openai
| | Azure OpenAI | gpt-4-deployment, gpt-35-turbo |mistral_ai
| | Mistral AI | mistral-large, mistral-small, codestral |cohere
| | Cohere | command-r-plus, command-r, embed-english |groq
| | Groq | llama-3.3-70b, mixtral-8x7b |ollama
| | Ollama (local) | llama3, mistral, codellama |together_ai
| | Together AI | llama-3-70b, mixtral-8x7b |fireworks_ai
| | Fireworks AI | llama-v3-70b, mixtral-8x7b |huggingface
| | HuggingFace | Various open models |replicate
| | Replicate | Various hosted models |perplexity
| | Perplexity | sonar-pro, sonar |
Note: Custom provider identifiers are also supported for internal or unlisted LLM services.
| Feature | Description |
|---------|-------------|
| Query Caching | LRU cache with configurable TTL |
| File Indexing | .idx sidecars for fast lookups |
| Gzip Support | Transparent decompression of .jsonl.gz files |
| BatchWriter | Buffered writes to reduce I/O |
| Streaming | Early termination for large files |
| Parallel Queries | Concurrent multi-directory queries |
| Cursor Pagination | Efficient large result set handling |
| Feature | Description |
|---------|-------------|
| Cache Metrics | Hit/miss/eviction tracking |
| Query Timing | Slow query warnings (>500ms) |
| Circuit Breaker Logging | State transition visibility |
| Health Check Stats | Cache statistics in health output |
| Feature | Description |
|---------|-------------|
| Query Escaping | ClickHouse-specific escaping, 22-pattern blocklist |
| Memory Limits | MAX_RESULTS_IN_MEMORY=10000, streaming aggregation |
| Input Validation | limit≤1000, date range≤365 days, regex limits |
| Type Safety | NaN/Infinity rejection, explicit type assertions |
See docs/security.md for details.
Scans multiple telemetry directories:
- Global: ~/.claude/telemetry/ (always checked).claude/telemetry/
- Project-local: , telemetry/, .telemetry/
File patterns (supports gzip compression):
- traces-YYYY-MM-DD.jsonl / .jsonl.gzlogs-YYYY-MM-DD.jsonl
- / .jsonl.gzmetrics-YYYY-MM-DD.jsonl
- / .jsonl.gzllm-events-YYYY-MM-DD.jsonl
- / .jsonl.gz
When configured, queries SigNoz Cloud API with:
- Circuit breaker protection
- Cursor-based pagination
- Response time tracking
Export data in OpenTelemetry format:
`javascript
// Export traces
const otlpTraces = await backend.exportTracesOTLP({ startDate: "2026-01-28" });
// Export logs
const otlpLogs = await backend.exportLogsOTLP({ severity: "ERROR" });
// Export metrics
const otlpMetrics = await backend.exportMetricsOTLP({ metricName: "http.duration" });
`
Export evaluations to Langfuse for unified tracing and evaluation analysis:
`javascript
// Export all evaluations from last 7 days
obs_export_langfuse({})
// Export with filters
obs_export_langfuse({
evaluationName: "quality",
scoreMin: 0.8,
limit: 500,
batchSize: 100
})
// Dry run to preview export
obs_export_langfuse({
startDate: "2026-01-28",
dryRun: true
})
// Override credentials (for testing)
obs_export_langfuse({
endpoint: "https://cloud.langfuse.com",
publicKey: "pk-lf-...",
secretKey: "sk-lf-..."
})
`
Features:
- Batched OTLP HTTP export with retry logic
- Memory protection (400MB warn, 600MB abort)
- Progress logging for large exports
- Credential sanitization in error messages
- DNS rebinding protection
Single-pass LLM evaluation for output quality:
`typescript
import { gEval, qagEvaluate, JudgeCircuitBreaker } from './lib/llm-as-judge.js';
// G-Eval pattern with chain-of-thought
const result = await gEval(testCase, criteria, llmFn);
// QAG faithfulness evaluation
const faithfulness = await qagEvaluate(testCase, llmFn);
// Production circuit breaker
const breaker = new JudgeCircuitBreaker(5, 60000);
const result = await breaker.evaluate(() => gEval(...));
`
Multi-step agent evaluation with trajectory analysis:
`typescript
import {
verifyToolCalls,
aggregateStepScores,
analyzeTrajectory,
collectiveConsensus,
ProceduralJudge,
ReactiveJudge,
} from './lib/agent-as-judge.js';
// Verify tool call correctness
const verifications = verifyToolCalls(actions, expectedTools);
// Analyze agent trajectory efficiency
const metrics = analyzeTrajectory({ actions, expectedSteps: 5 });
// Multi-agent consensus evaluation
const consensus = await collectiveConsensus(judges, { id: 'eval-1' }, {
rounds: 3,
convergenceThreshold: 0.05,
});
// Procedural multi-stage evaluation
const proceduralJudge = new ProceduralJudge([
{ name: 'syntax', evaluate: syntaxChecker },
{ name: 'semantic', evaluate: semanticAnalyzer },
]);
const result = await proceduralJudge.evaluate(evaluand);
// Reactive specialist-based evaluation
const reactiveJudge = new ReactiveJudge(router, specialists, deepDiveSpecialists);
const result = await reactiveJudge.evaluate(evaluand);
`
`javascript
// Filter by agent ID/name
obs_query_evaluations({
agentId: 'agent-123',
agentName: 'TaskRunner',
evaluationName: 'tool_correctness',
})
// Response includes agent-specific fields
{
stepScores: [{ step: 0, score: 0.9, explanation: '...' }],
toolVerifications: [{ toolName: 'search', toolCorrect: true, score: 1.0 }],
trajectoryLength: 5,
}
`
`bash``
cd ~/.claude/mcp-servers/observability-toolkit
npm install
npm run build
npm test # 3254 tests
npm run start
- docs/changelog/ - Version history and changelogs
- docs/reliability/security.md - Security controls and hardening
- docs/quality/llm-as-judge.md - LLM-as-Judge architecture
- docs/quality/agent-as-judge.md - Agent-as-Judge architecture
- docs/backlog/ - Feature backlog and roadmap
- docs/changelog/SESSION_HISTORY.md - Development session logs
- docs/Summary.md - Full documentation index