MCP server for structured reasoning with cognitive trap detection
npm install verifiable-thinking-mcpYour LLM is confidently wrong 40% of the time on reasoning questions. This fixes that.




15 trap patterns detected in <1ms. No LLM calls. Just pattern matching.
Quick Start • Features • Trap Detection • API
---
```
┌────────────────────────────────────────────────────────────────┐
│ "A bat and ball cost $1.10. The bat costs $1 more..." │
│ ↓ │
│ TRAP DETECTED: additive_system │
│ > Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10 │
│ ↓ │
│ Answer: $0.05 (not $0.10) │
└────────────────────────────────────────────────────────────────┘
`bash`
npx -y verifiable-thinking-mcp
Add to Claude Desktop (claude_desktop_config.json):
`json`
{
"mcpServers": {
"verifiable-thinking": {
"command": "npx",
"args": ["-y", "verifiable-thinking-mcp"]
}
}
}
| | |
|---|---|
| 🎯 Trap Detection | 15 patterns (bat-ball, Monty Hall, base rate) caught before reasoning starts |
| ⚔️ Auto-Challenge | Forces counterarguments when confidence >95%—no more overconfident wrong answers |
| 🔍 Contradiction Detection | Catches "Let x=5" then "Now x=10" across steps |
| 🌿 Hypothesis Branching | Explore alternatives, auto-detects when branches confirm/refute |
| 🔢 Local Math | Evaluates expressions without LLM round-trips |
| 🗜️ Smart Compression | 56.8% token savings with query-aware CPC compression |
| ⚡ Real Token Counting | Tiktoken integration—3,922× cache speedup, zero estimation error |
Every operation counts. Verifiable Thinking uses real token counting (tiktoken) and intelligent compression to cut costs by 50-60% without sacrificing reasoning quality.
`typescript
// Traditional reasoning: ~1,350 tokens for 10-step chain
// Verifiable Thinking: ~580 tokens (56.8% savings)
// Real token counting (not estimation)
countTokens("What is 2+2?") // → 7 tokens (not 3)
// Cache speedup: 3,922× faster on repeated strings
// Compress before processing (not just storage)
scratchpad({
operation: "step",
thought: "Long analysis...", // 135 tokens → 72 tokens
compress: true
})
// Budget controls
scratchpad({
warn_at_tokens: 2000, // Soft warning
hard_limit_tokens: 5000 // Hard stop
})
`
At scale: 1,000 reasoning chains/day = $4,193/year saved (at GPT-4o pricing).
See docs/token-optimization.md for architecture details and benchmarks.
`typescript
// Start with a question—trap detection runs automatically
scratchpad({
operation: "step",
question: "A bat and ball cost $1.10...",
thought: "Let ball = x, bat = x + 1.00",
confidence: 0.9
})
// → Returns trap_analysis warning
// High confidence? Auto-challenge kicks in
scratchpad({ operation: "step", thought: "...", confidence: 0.96 })
// → Returns challenge_suggestion: "What if your assumption is wrong?"
// Complete with spot-check
scratchpad({ operation: "complete", final_answer: "$0.05" })
`
| Pattern | What It Catches |
|---------|-----------------|
| additive_system | Bat-ball, widget-gadget (subtract instead of solve) |nonlinear_growth
| | Lily pad doubling (linear interpolation) |monty_hall
| | Door switching (50/50 fallacy) |base_rate
| | Medical tests (ignoring prevalence) |independence
| | Coin flips (gambler's fallacy) |
All 15 patterns
| Pattern | Trap |
|---------|------|
| additive_system | Subtract instead of solve |nonlinear_growth
| | Linear interpolation |rate_pattern
| | Incorrect scaling |harmonic_mean
| | Arithmetic mean for rates |independence
| | Gambler's fallacy |pigeonhole
| | Underestimate worst case |base_rate
| | Ignore prevalence |factorial_counting
| | Simple division |clock_overlap
| | Assume 12 overlaps |conditional_probability
| | Ignore conditioning |conjunction_fallacy
| | More detail = more likely |monty_hall
| | 50/50 after reveal |anchoring
| | Irrelevant number influence |sunk_cost
| | Past investment bias |framing_effect
| | Gain/loss framing |
scratchpad — the main tool with 11 operations:
| Operation | What It Does |
|-----------|--------------|
| step | Add reasoning step (trap priming on first) |complete
| | Finalize with auto spot-check |revise
| | Fix earlier step |branch
| | Explore alternative path |challenge
| | Force adversarial self-check |navigate
| | View history/branches |
All operations
| Operation | Purpose |
|-----------|---------|
| step | Add reasoning step |complete
| | Finalize chain |revise
| | Fix earlier step |branch
| | Alternative path |challenge
| | Adversarial self-check |navigate
| | View history |spot_check
| | Manual trap check |hint
| | Progressive simplification |mistakes
| | Algebraic error detection |augment
| | Compute math expressions |override
| | Force-commit failed step |
Other tools: list_sessions, get_session, clear_session, compress
| | Sequential Thinking | Verifiable Thinking |
|---|---|---|
| Trap detection | ❌ | 15 patterns |
| Auto-challenge | ❌ | >95% confidence |
| Contradiction detection | ❌ | ✅ |
| Confidence tracking | ❌ | Per-step + chain |
| Local compute | ❌ | ✅ |
| Token budgets | ❌ | Soft + hard limits |
| Real token counting | ❌ | Tiktoken (3,922× cache speedup) |
| Compression | ❌ | 56.8% token savings |
Sequential Thinking is ~100 lines. This is 22,000+ with 1,831 tests.
See docs/competitive-analysis.md for full breakdown.
`bash``
git clone https://github.com/CoderDayton/verifiable-thinking-mcp.git
cd verifiable-thinking-mcp && bun install
bun run dev # Interactive MCP Inspector
bun test # 1,831 tests
MIT
---