Root Cause Analysis MCP server scaffolding.
npm install mcp-rcaRoot Cause Analysis MCP server that helps SRE teams structure observations, hypotheses, and test plans while collaborating with an LLM.
- Prompt-based guidance: MCP prompts guide the LLM through each RCA phase
- rca_start_investigation - Begin investigation with structured initial steps
- rca_next_step - Get context-aware recommendations based on case state
- rca_hypothesis_propose - Generate testable root cause hypotheses
- rca_verification_planning - Create effective test plans
- rca_conclusion_guide - Document conclusions with root causes and follow-ups
- LLM-oriented tools: Guidance tools provide best practices and phase-specific checklists
- guidance_best_practices - RCA principles and anti-patterns
- guidance_phase - Phase-specific steps and red flags
- guidance_prompt_scaffold - Structured output formats for tasks
- guidance_followups - Post-conclusion follow-up actions
- guidance_prompts_catalog - Discover available prompts with default templates
- guidance_tools_catalog - Comprehensive tool catalog with workflow guidance
- Hypothesis generation returns persisted objects with IDs
- hypothesis_propose persists generated hypotheses and returns each item with id, caseId, createdAt, and updatedAt.
- When the generator supplies a verification plan in its output, an initial test_plan_create is called automatically and minimal info is attached to the hypothesis (method/expected/metric?).
- Git/deploy metadata on Case / Observation / TestPlan
- Optional fields: gitBranch, gitCommit, deployEnv.
- Set on create and update tools; passing null on update clears the field.
``bash`
npm install mcp-rca
To launch the server directly as a CLI:
`bash`
npx mcp-rca
The server communicates over stdio and can be attached to any MCP-compatible client. CLI flags include --help (-h) for usage and --version (-v) to print the current release.
1. Clone the repository and install dependencies:
`bash`
git clone https://github.com/mako10k/mcp-rca.git
cd mcp-rca
npm install
`
2. Launch the developer server with hot reloading:
bash`
npm run dev
dist/
3. Produce a production bundle (emits and copies prompt assets):`
bash`
npm run build
``
src/
framework/ # Local stub for MCP server lifecycle
server.ts # MCP server entrypoint
schema/ # TypeScript data models
tools/ # Tool handlers surfaced to MCP clients
llm/ # Prompt assets and LLM utilities
data/
.gitkeep # Runtime storage directory (cases.json generated at runtime)
scripts/
copy-assets.mjs # Copies static prompt assets into dist/ post-build
Refer to AGENT.md for the full specification, roadmap, and design guidelines.
MCP prompts guide your investigation through each phase:
Use prompt: rca_start_investigation
→ Creates a structured plan for case creation and initial observations
`$3
`
Use prompt: rca_next_step with caseId
→ Analyzes current state and suggests next actions
`$3
`
Use prompt: rca_hypothesis_propose with caseId
→ Guides hypothesis generation with best practices
→ Then call tool: hypothesis_propose to create and persist hypotheses
`$3
`
Use prompt: rca_verification_planning with caseId, hypothesisId, hypothesisText
→ Provides test plan templates and prioritization guidance
→ Then call tool: test_plan_create to create verification plans
`$3
`
Use prompt: rca_conclusion_guide with caseId
→ Guides documentation of root causes, fixes, and follow-ups
→ Then call tool: conclusion_finalize to close the case
`$3
Call guidance tools at any time for additional support:
-
guidance_best_practices - Core RCA principles
- guidance_phase - Phase-specific checklists (observation/hypothesis/testing/conclusion)
- guidance_prompt_scaffold - Output format templates for specific tasks
- guidance_followups - Prevention and follow-up suggestionsMCP Tool Highlights
$3
Input (summary):
`json
{
"caseId": "case_...",
"text": "Short incident summary",
"rationale": "Optional background",
"context": { "service": "api", "region": "us-east-1" },
"logs": "... optional log snippets ..."
}
`Output (each hypothesis is persisted and includes identifiers; an initial test plan may be present if provided by the generator):
`json
{
"hypotheses": [
{
"id": "hyp_...",
"caseId": "case_...",
"text": "Cache node eviction storm caused by oversized payloads",
"rationale": "Spike correlates with payload growth and cache TTL",
"createdAt": "2025-10-21T00:00:00.000Z",
"updatedAt": "2025-10-21T00:00:00.000Z",
"testPlan": {
"id": "tp_...",
"hypothesisId": "hyp_...",
"method": "Reproduce with oversized payloads and inspect eviction rate",
"expected": "Evictions rise sharply with payload size > X",
"metric": "cache.evictions"
}
}
]
}
`$3
The following tools accept optional metadata fields; on update,
null clears the field.- Case
-
case_create: gitBranch, gitCommit, deployEnv
- case_update: gitBranch?, gitCommit?, deployEnv? (nullable clears)
- Observation
- observation_add: gitBranch?, gitCommit?, deployEnv?
- observation_update: gitBranch?, gitCommit?, deployEnv? (nullable clears)
- Test Plan
- test_plan_create: gitBranch?, gitCommit?, deployEnv?
- test_plan_update: gitBranch?, gitCommit?, deployEnv? (nullable clears)Example update payload that clears
gitCommit on an observation:`json
{
"caseId": "case_...",
"observationId": "obs_...",
"gitCommit": null
}
`Responses include the persisted metadata when set; fields are omitted when unset.
$3
Use
observations_list to query observations without pulling the full case payload:`json
{
"caseId": "case_...",
"query": "DriveNotFoundException",
"fields": ["what", "context"],
"pageSize": 10,
"gitBranch": "release",
"order": "desc"
}
`The response returns
observations, nextCursor, total, pageSize, and hasMore. Pass cursor with the next call to page through the set. Combine with the case_get summary mode (include: []) to minimize token usage. See docs/CASE_GET_PAGINATION.md for cursor details.API Response Structure
All mutation tools follow a consistent response structure for predictability and ease of use:
$3
`typescript
{
caseId: string; // Always at top level
[resourceName]: Resource; // The created/updated/removed resource
case: Case; // Full case object after the mutation
}
`Benefits:
- ✅ Consistent: Same pattern across all mutation tools
- ✅ Context Access:
caseId always at top level
- ✅ Immediate State: Full case object available without additional queries
- ✅ Token Optimization: Combine with case_get's include parameter for efficient workflowsExamples:
-
observation_add → { caseId, observation, case }
- hypothesis_propose → { caseId, hypotheses, case }
- test_plan_create → { caseId, testPlan, case }
- conclusion_finalize → { caseId, conclusion, case }docs/RESPONSE_STRUCTURE_STANDARDIZATION.md for complete details.Performance & Best Practices
$3
Many mutation tools (e.g.,
observation_add, hypothesis_update) return the complete case object in their responses, which can consume thousands of tokens per operation.Recommended pattern:
`javascript
// Perform mutations without relying on the case field
await observation_add({ caseId, what: "..." });
await observation_add({ caseId, what: "..." });// Fetch case data selectively when needed
const caseData = await case_get({
caseId,
include: ['observations'], // Only fetch what you need
});
`See docs/API_RESPONSE_OPTIMIZATION.md for detailed optimization strategies and token savings examples.
For paging details (limits, cursors, and
include semantics) see docs/CASE_GET_PAGINATION.md.License
This project is released under the MIT License. See the
LICENSE file for details.Publishing
The package is configured for the public npm registry. After bumping the version, run:
`bash
npm publish --access public
`prepublishOnly` rebuilds TypeScript sources and copies required assets before the tarball is generated.