verifiable-thinking-mcp

Verifiable Thinking MCP

Your LLM is confidently wrong 40% of the time on reasoning questions. This fixes that.

![npm version](https://www.npmjs.com/package/verifiable-thinking-mcp)
![CI](https://github.com/CoderDayton/verifiable-thinking-mcp/actions/workflows/ci.yml)
![codecov](https://codecov.io/gh/CoderDayton/verifiable-thinking-mcp)
![License: MIT](https://opensource.org/licenses/MIT)

15 trap patterns detected in <1ms. No LLM calls. Just pattern matching.

Quick Start • Features • Trap Detection • API

---

``┌────────────────────────────────────────────────────────────────┐ │ "A bat and ball cost $1.10. The bat costs $1 more..." │ │ ↓ │ │ TRAP DETECTED: additive_system │ │ > Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10 │ │ ↓ │ │ Answer: $0.05 (not $0.10) │ └────────────────────────────────────────────────────────────────┘`

`Quick Start`

`bash npx -y verifiable-thinking-mcp`

Add to Claude Desktop (claude_desktop_config.json):

`json { "mcpServers": { "verifiable-thinking": { "command": "npx", "args": ["-y", "verifiable-thinking-mcp"] } } }`

`Features`

| | | |---|---| | 🎯 Trap Detection | 15 patterns (bat-ball, Monty Hall, base rate) caught before reasoning starts | | ⚔️ Auto-Challenge | Forces counterarguments when confidence >95%—no more overconfident wrong answers | | 🔍 Contradiction Detection | Catches "Let x=5" then "Now x=10" across steps | | 🌿 Hypothesis Branching | Explore alternatives, auto-detects when branches confirm/refute | | 🔢 Local Math | Evaluates expressions without LLM round-trips | | 🗜️ Smart Compression | 56.8% token savings with query-aware CPC compression | | ⚡ Real Token Counting | Tiktoken integration—3,922× cache speedup, zero estimation error |

`Token Efficiency`

Every operation counts. Verifiable Thinking uses real token counting (tiktoken) and intelligent compression to cut costs by 50-60% without sacrificing reasoning quality.

`typescript // Traditional reasoning: ~1,350 tokens for 10-step chain // Verifiable Thinking: ~580 tokens (56.8% savings)

// Real token counting (not estimation) countTokens("What is 2+2?") // → 7 tokens (not 3) // Cache speedup: 3,922× faster on repeated strings

// Compress before processing (not just storage) scratchpad({ operation: "step", thought: "Long analysis...", // 135 tokens → 72 tokens compress: true })

// Budget controls scratchpad({ warn_at_tokens: 2000, // Soft warning hard_limit_tokens: 5000 // Hard stop })`

At scale: 1,000 reasoning chains/day = $4,193/year saved (at GPT-4o pricing).

See docs/token-optimization.md for architecture details and benchmarks.

`How It Works`

`typescript // Start with a question—trap detection runs automatically scratchpad({ operation: "step", question: "A bat and ball cost $1.10...", thought: "Let ball = x, bat = x + 1.00", confidence: 0.9 }) // → Returns trap_analysis warning

// High confidence? Auto-challenge kicks in scratchpad({ operation: "step", thought: "...", confidence: 0.96 }) // → Returns challenge_suggestion: "What if your assumption is wrong?"

// Complete with spot-check scratchpad({ operation: "complete", final_answer: "$0.05" })`

`Trap Detection`

| Pattern | What It Catches | |---------|-----------------| |additive_system| Bat-ball, widget-gadget (subtract instead of solve) | |nonlinear_growth| Lily pad doubling (linear interpolation) | |monty_hall| Door switching (50/50 fallacy) | |base_rate| Medical tests (ignoring prevalence) | |independence | Coin flips (gambler's fallacy) |


All 15 patterns

| Pattern | Trap | |---------|------| |additive_system| Subtract instead of solve | |nonlinear_growth| Linear interpolation | |rate_pattern| Incorrect scaling | |harmonic_mean| Arithmetic mean for rates | |independence| Gambler's fallacy | |pigeonhole| Underestimate worst case | |base_rate| Ignore prevalence | |factorial_counting| Simple division | |clock_overlap| Assume 12 overlaps | |conditional_probability| Ignore conditioning | |conjunction_fallacy| More detail = more likely | |monty_hall| 50/50 after reveal | |anchoring| Irrelevant number influence | |sunk_cost| Past investment bias | |framing_effect | Gain/loss framing |

`Tools`

scratchpad — the main tool with 11 operations:

| Operation | What It Does | |-----------|--------------| |step| Add reasoning step (trap priming on first) | |complete| Finalize with auto spot-check | |revise| Fix earlier step | |branch| Explore alternative path | |challenge| Force adversarial self-check | |navigate | View history/branches |


All operations

| Operation | Purpose | |-----------|---------| |step| Add reasoning step | |complete| Finalize chain | |revise| Fix earlier step | |branch| Alternative path | |challenge| Adversarial self-check | |navigate| View history | |spot_check| Manual trap check | |hint| Progressive simplification | |mistakes| Algebraic error detection | |augment| Compute math expressions | |override | Force-commit failed step |

Other tools: list_sessions, get_session, clear_session, compress

`vs Sequential Thinking MCP`

| | Sequential Thinking | Verifiable Thinking | |---|---|---| | Trap detection | ❌ | 15 patterns | | Auto-challenge | ❌ | >95% confidence | | Contradiction detection | ❌ | ✅ | | Confidence tracking | ❌ | Per-step + chain | | Local compute | ❌ | ✅ | | Token budgets | ❌ | Soft + hard limits | | Real token counting | ❌ | Tiktoken (3,922× cache speedup) | | Compression | ❌ | 56.8% token savings |

Sequential Thinking is ~100 lines. This is 22,000+ with 1,831 tests.

See docs/competitive-analysis.md for full breakdown.

`Development`

`bash git clone https://github.com/CoderDayton/verifiable-thinking-mcp.git cd verifiable-thinking-mcp && bun install bun run dev # Interactive MCP Inspector bun test # 1,831 tests``

License

MIT

---

Report Bug · Request Feature