Intelligent context window optimization for Claude Code - store content externally via caching and compression, freeing up your context window for what matters
npm install token-optimizer-mcp> Intelligent token optimization through caching and compression for Claude Code and Claude Desktop
Token Optimizer MCP is a Model Context Protocol (MCP) server that reduces context window usage through intelligent caching and compression. By storing compressed content externally in SQLite, the server removes tokens from your context window while keeping them accessible. The server provides tools to compress text, cache results, and analyze token usage - helping you maximize your available context window.
- Context Window Optimization: Store content externally to free up context window space
- High Compression: Brotli compression (2-4x typical, up to 82x for repetitive content)
- Persistent Caching: SQLite-based cache that persists across sessions
- Accurate Token Counting: Uses tiktoken for precise token measurements
- Smart Analysis: Analyze text to determine optimal caching strategy
- Zero External Dependencies: Completely offline operation
- Production Ready: Built with TypeScript for reliability
- Multi-Tool Support: Auto-detects and configures ALL installed AI tools
The automated installer detects and configures token-optimizer-mcp for:
- ✅ Claude Code - CLI with global hooks integration
- ✅ Claude Desktop - Native desktop application
- ✅ Cursor IDE - AI-first code editor
- ✅ Cline - VS Code extension (formerly Claude Dev)
- ✅ GitHub Copilot - VS Code with MCP support
- ✅ Windsurf IDE - AI-powered development environment
No manual configuration needed - the installer automatically detects which tools you have installed and configures them all!
1. Cache Engine - SQLite-based persistent storage with automatic cleanup
2. Token Counter - Accurate token counting using tiktoken (GPT-4 tokenizer)
3. Compression Engine - Brotli compression for optimal token efficiency
1. optimize_text - Compress and cache text to reduce token usage
2. get_cached - Retrieve previously cached and optimized text
3. count_tokens - Count tokens in text using tiktoken
4. compress_text - Compress text using Brotli compression
5. decompress_text - Decompress base64-encoded Brotli-compressed text
6. get_cache_stats - Get cache statistics including hit rate and compression ratio
7. clear_cache - Clear all cached data
8. analyze_optimization - Analyze text and get optimization recommendations
- Runtime: Node.js 20+
- Language: TypeScript
- Database: SQLite (better-sqlite3)
- Token Counting: tiktoken
- Compression: Brotli (built-in)
- Caching: LRU Cache
- Protocol: MCP SDK (@modelcontextprotocol/sdk)
Install the MCP server and global hooks in one command:
#### Windows
``powershell`Download and run the automated installer
irm https://raw.githubusercontent.com/ooples/token-optimizer-mcp/main/install-hooks.ps1 | iex
#### macOS / Linux
`bash`Download and run the automated installer
curl -fsSL https://raw.githubusercontent.com/ooples/token-optimizer-mcp/main/install-hooks.sh | bash
This will:
1. ✅ Install token-optimizer-mcp globally via npm
2. ✅ Download and configure global Claude Code hooks
3. ✅ Auto-detect and configure ALL installed AI tools:
- Claude Desktop
- Cursor IDE
- Cline (VS Code extension)
- GitHub Copilot (VS Code)
- Windsurf IDE
4. ✅ Set up automatic token optimization on every tool call
5. ✅ Configure workspace trust and execution permissions
6. ✅ Verify the installation
Token Reduction: 60-90% average across all operations on all supported AI tools!
For detailed platform-specific installation instructions, see HOOKS-INSTALLATION.md.
#### Quick Overview
All Platforms:
`bash1. Install the MCP server
npm install -g @ooples/token-optimizer-mcp
Windows:
`powershell
cd "$env:APPDATA\npm\node_modules\@ooples\token-optimizer-mcp"
.\install-hooks.ps1
`macOS / Linux:
`bash
NPM_PREFIX=$(npm config get prefix)
cd "$NPM_PREFIX/lib/node_modules/@ooples/token-optimizer-mcp"
bash install-hooks.sh
`See HOOKS-INSTALLATION.md for complete instructions including:
- Platform-specific configuration
- Workspace trust setup
- Troubleshooting guides
- Verification steps
How It Works
$3
When global hooks are installed, token-optimizer-mcp runs automatically on every tool call:
`
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: PreToolUse - Tool Replacement │
│ ├─ Read → smart_read (60-90% token reduction) │
│ ├─ Grep → smart_grep (60-90% token reduction) │
│ └─ Glob → smart_glob (60-90% token reduction) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Input Validation - Cache Lookups │
│ └─ get_cached checks if operation was already done │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: PostToolUse - Output Optimization │
│ ├─ optimize_text for large outputs │
│ └─ compress_text for repeated content │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 4: Session Tracking │
│ └─ Log all operations to operations-{sessionId}.csv │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 5: UserPromptSubmit - Prompt Optimization │
│ └─ Optimize user prompts before sending to API │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 6: PreCompact - Pre-Compaction Optimization │
│ └─ Optimize before Claude Code compacts the conversation │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 7: Metrics & Reporting │
│ └─ Track token reduction metrics and generate reports │
└─────────────────────────────────────────────────────────────┘
`Result: Every Read, Grep, Glob, Edit, Bash operation is automatically optimized for token efficiency!
$3
Based on 38,000+ operations:
| Tool | Avg Before | Avg After | Reduction |
|------|-----------|----------|-----------|
| Read | 5,000 tokens | 1,850 tokens | 63% |
| Grep | 2,000 tokens | 740 tokens | 63% |
| Glob | 1,500 tokens | 555 tokens | 63% |
| Edit | 3,500 tokens | 1,295 tokens | 63% |
Per-Session Savings: 300K-700K tokens (worth $0.90-$2.10 at $3/M tokens)
Usage Examples
$3
`typescript
// Use the optimize_text tool to cache content externally
optimize_text({
text: "Your large text content here...",
key: "my-cache-key",
quality: 11 // 0-11, higher = better compression
})// Result - compressed data is stored in SQLite, NOT returned in context
{
"success": true,
"key": "my-cache-key",
"originalTokens": 1500,
"compressedTokens": 450, // Tokens IF it were in context (not relevant)
"tokensSaved": 1050, // Context window savings (what matters)
"percentSaved": 70.5, // Based on compression + external storage
"originalSize": 6000,
"compressedSize": 1800, // Stored in SQLite
"cached": true
}
// Your context window now contains only the cache key (~50 tokens)
// instead of the original 1500 tokens - 96.7% reduction in context usage!
`$3
`typescript
// Use the get_cached tool
get_cached({ key: "my-cache-key" })// Result:
{
"success": true,
"key": "my-cache-key",
"text": "Your original text content...",
"fromCache": true
}
`$3
`typescript
// Use the count_tokens tool
count_tokens({ text: "Your text here" })// Result:
{
"tokens": 150,
"characters": 500
}
`$3
`typescript
// Use the analyze_optimization tool
analyze_optimization({ text: "Your text here" })// Result:
{
"tokens": {
"current": 1500,
"afterCompression": 450,
"saved": 1050,
"percentSaved": 70
},
"size": {
"current": 6000,
"compressed": 1800,
"ratio": 3.33,
"percentSaved": 70
},
"recommendations": {
"shouldCompress": true,
"reason": "Compression will provide significant token savings"
}
}
`$3
`typescript
// Use the get_cache_stats tool
get_cache_stats({})// Result:
{
"totalEntries": 15,
"totalSize": 45000,
"hits": 42,
"misses": 8,
"hitRate": 84.0,
"avgCompressionRatio": 3.2,
"totalTokensSaved": 12500
}
`How Token Optimization Works
$3
It's important to understand the difference between compression ratio and context window savings:
#### Compression Ratio
- Measures how much the original data is reduced in size (e.g., 10KB → 2KB = 5x compression)
- Brotli achieves 2-4x typical compression, up to 82x for highly repetitive content
- Does NOT directly translate to token savings in the compressed form
#### Context Window Savings (The Real Benefit)
When you cache content using this MCP server:
1. Original text is compressed with Brotli (up to 82x compression)
2. Compressed data is stored externally in SQLite database
3. 100% of original tokens are removed from your context window
4. Only a small cache key remains in context (~50 tokens for key + metadata)
Example: A 10,000 token API response is cached:
- Before: 10,000 tokens in your context window
- After: ~50 tokens (cache key + metadata)
- Savings: 9,950 tokens removed from context (99.5% reduction)
$3
When you compress text without caching (using
compress_text), the compressed data must be encoded as Base64 to be transmitted as text:- Base64 encoding adds ~33% overhead to the compressed size
- This often results in MORE tokens than the original (unless compression ratio >4x)
- Solution: Use
optimize_text which caches the compressed data externally$3
High Value Use Cases:
- Caching large API responses that are referenced multiple times
- Storing repetitive configuration or data files
- Caching large code files that need to be referenced repeatedly
- Archiving conversation history while keeping it accessible
Lower Value Use Cases:
- Small text snippets (<500 characters) - overhead exceeds savings
- One-time use content - no benefit from caching
- Content with low compression ratio - external storage still helps
$3
`
Original Text (10,000 tokens)
↓
Brotli Compress (82x ratio)
↓
Store in SQLite (~122 bytes)
↓
Return Cache Key (~50 tokens)
↓
RESULT: 9,950 tokens removed from context window
`The key insight: The value is in external storage, not compression alone. Even with modest compression ratios, moving data out of your context window provides massive savings.
Development
`bash
Build the project
npm run buildRun in development mode (watch)
npm run devRun tests (when implemented)
npm testRun benchmarks (when implemented)
npm run benchmark
``1. Compression: Uses Brotli compression (quality 11) to reduce text size
2. Token Counting: Uses tiktoken with GPT-4 tokenizer for accurate counts
3. Caching: Stores compressed text in SQLite database for persistence
4. Cache Management: Automatic cleanup of old entries to prevent unbounded growth
- Compression Ratio: Typically 2-4x size reduction (up to 82x for highly repetitive content)
- Context Window Savings: Up to 100% for cached content (removed from context window)
- Cache Hit Rate: >80% in typical usage
- Overhead: <10ms for cache operations
- Compression Speed: ~1ms per KB of text
- Small Text: Best for text >500 characters (cache overhead on small snippets)
- Base64 Overhead: Compressed-only output (without caching) may use MORE tokens due to Base64 encoding
- Cache Storage: Cache size limited to prevent disk usage issues (automatic cleanup after 7 days)
- Compression Tradeoff: Quality setting affects speed vs ratio (default quality 11 is optimal)
- Token Counting: Uses GPT-4 tokenizer (approximation for Claude, but close enough for optimization decisions)
- One-Time Content: No benefit for content that won't be referenced again (caching provides the value)
ISC
Built for optimal Claude Code token efficiency.