Delphi MCP Server - Multi-model AI consensus for complex questions. Query Claude, GPT-5, Gemini, DeepSeek simultaneously and synthesize diverse perspectives.
npm install delphi-mcp
Complex questions deserve multiple perspectives.
Delphi queries diverse AI models, synthesizes their views, and surfaces genuine consensus.
Requirements •
Quick Start •
How It Works •
Example •
Full Docs
---
Complex questions rarely have simple answers. A single AI model gives you one perspective shaped by its training data and architecture. For nuanced topics—technical trade-offs, research questions, multi-faceted decisions—one viewpoint isn't enough.
The insight: Different AI models reason differently. When multiple models independently arrive at the same conclusion, you can trust it. When they disagree, you've found genuine complexity worth exploring.
---
- Node.js 18+
- OpenRouter API key — Get one at openrouter.ai/keys ($5-10 credit is plenty to start)
- Claude Desktop or any MCP-compatible client
---
``bash`
npm install -g delphi-mcp
Add to Claude Desktop config:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json
Windows:
`json`
{
"mcpServers": {
"delphi": {
"command": "delphi-mcp",
"env": { "OPENROUTER_API_KEY": "sk-or-v1-your-key" }
}
}
}
Restart Claude Desktop. Done.
---
1. Independent responses — Each model answers without seeing others
2. Revision rounds — Models see the synthesis and can revise or challenge
3. Convergence detection — Stops when 85% agreement is reached
4. Hallucination flagging — Claims from only one model get flagged
---
> Question: Should we use microservices or a monolith for a new e-commerce platform?
Round 1 — Initial Positions:
| Model | Position |
|-------|----------|
| Claude | Monolith first, extract services later |
| GPT-5 | Microservices for scalability from day one |
| Gemini | Depends on team size and experience |
| DeepSeek | Modular monolith as middle ground |
Round 2 — After seeing each other's reasoning:
- GPT-5 revised: "Agreed that premature microservices add complexity. Team size matters."
- Claude maintained position but acknowledged: "Microservices make sense if team is 50+ engineers"
- All models converged on team size as the key factor
Round 3 — Final Synthesis:
| Claim | Strength | Agreement |
|-------|----------|-----------|
| Start with monolith for teams < 20 engineers | unanimous | 5/5 |
| Modular boundaries enable future extraction | unanimous | 5/5 |
| Microservices add 3-5x operational overhead | strong | 4/5 |
| Extract services only when team/traffic demands | strong | 4/5 |
| Kubernetes required for microservices | disputed | 2/5 |
Key Disagreement Surfaced:
> "Kubernetes required for microservices" — Claude and DeepSeek disagreed, noting alternatives like ECS, Nomad, or even simple VM deployments. This flags an area where the "conventional wisdom" may be overconfident.
Control Drift: 45% — A single model would have given a more opinionated answer without surfacing the team-size nuance or the Kubernetes debate.
---
| Preset | Tier | Rounds | Grounding | Cost | Use Case |
|--------|------|--------|-----------|------|----------|
| quick | fast | 2 | off | ~$0.04 | Quick checks |balanced
| | standard | 4 | off | ~$0.20 | General queries |research
| | premium | 6 | on | ~$0.50 | Deep analysis |factcheck
| | standard | 3 | on | ~$0.25 | Verify claims |
---
Use Delphi for:
- Complex technical decisions with trade-offs
- Research questions with multiple valid perspectives
- High-dimensional problems (many factors to weigh)
- Topics where experts genuinely disagree
- Validating important conclusions before acting
Skip Delphi for:
- Simple factual lookups → single model is fine
- Creative writing → diversity unhelpful
- Real-time chat → too slow
- Well-defined problems with clear answers
Decision rule: If the question has genuine complexity and the answer matters, use Delphi.
---
- Multi-Model Consensus — Claude, GPT-5, Gemini, DeepSeek working together
- Dynamic Convergence — Iterates until 85% agreement or surfaces disagreement
- Claim Strength — See which points are unanimous vs genuinely disputed
- Revision Rounds — Models can challenge and refine each other's reasoning
- Expert Personas — Frame panelists as domain experts for deeper analysis
- Diverse Panel Mode — Assign complementary expert roles within a domain
- Web Grounding — Optionally verify claims against live sources
- Budget Controls — Token and cost limits for predictable spend
- Multiple Formats — Markdown, JSON, HTML, plain text
---
Like a real Delphi study, you can frame panelists as domain experts:
`json`
{
"question": "What are the security implications of storing JWTs in localStorage?",
"expertise": "security"
}
Available domains:
| Domain | Expert Type |
|--------|-------------|
| security | Security Engineer (15+ years, penetration testing, secure development) |finance
| | Financial Analyst (investment banking, risk management) |medical
| | Medical Researcher (clinical medicine, evidence-based medicine) |legal
| | Legal Expert (corporate law, IP, regulatory compliance) |engineering
| | Software Engineer (system design, architecture patterns) |data-science
| | Data Scientist (ML, statistical analysis) |economics
| | Economist (micro/macro economics, policy analysis) |architecture
| | Systems Architect (distributed systems, cloud platforms) |devops
| | DevOps Engineer (CI/CD, infrastructure automation) |product
| | Product Manager (strategy, user research, go-to-market) |
Add diversePersonas: true to give each panelist a different complementary role within the domain — just like assembling a real expert panel:
`json`
{
"question": "Should we migrate to microservices?",
"expertise": "architecture",
"diversePersonas": true
}
For architecture, this creates a panel of:
- Cloud architect (AWS/GCP/Azure best practices)
- Platform architect (internal developer platforms)
- Data architect (data modeling, warehousing)
- Integration architect (APIs, messaging)
- Security architect (zero-trust, identity management)
- Solutions architect (customer requirements)
For the most authentic Delphi experience, let the administrator automatically determine what experts are needed based on your question:
`json`
{
"question": "Should we implement rate limiting at the API gateway or application layer?",
"autoExpertise": true
}
The administrator analyzes your question and dynamically generates an optimal expert panel:
| Expert | Focus | Perspective |
|--------|-------|-------------|
| API Gateway Architect | Rate limiting patterns, edge vs origin | Infrastructure scalability |
| Security Engineer | DDoS protection, abuse prevention | Defensive, assumes adversarial users |
| Backend Developer | Application-level implementation | Developer experience, maintainability |
| SRE/Platform Engineer | Observability, failure modes | Operational reliability |
Why auto-expertise?
- Mimics how real Delphi studies select experts based on the question
- No need to guess which domain fits best
- Gets complementary perspectives without manual configuration
- Shows rationale for why each expert was chosen
---
| Tool | Description |
|------|-------------|
| delphi_query | Multi-model consensus query |delphi_factcheck
| | Fact-check a specific claim |delphi_list_models
| | List available models |delphi_estimate_cost` | Estimate before running |
|
---
For full technical documentation including:
- All configuration options
- Test results & insights
- Architecture internals
- Cost analysis
- Safety features
---
MIT — Built by Thor Matthiasson