A TypeScript/Node.js CLI tool for local code indexing and search using SQLite
npm install @squirrelsoft/code-indexA fast, offline TypeScript/Node.js CLI tool for local code indexing and search using SQLite.
- π Fast Indexing - Process 1,000+ files per second
- π Instant Search - Full-text and regex search with <100ms response time
- π§ Hybrid Search - Combines lexical (BM25) + semantic (vector) search with configurable fusion
- π³ AST-Based Symbol Index - Tree-sitter powered parsing with full symbol extraction
- π Call Graph Tracking - Navigate function calls, callers, and callees
- πΎ Offline First - All data stored locally in SQLite + JSON
- π Incremental Updates - Refresh only changed files
- π File Watcher - Real-time index updates with debounced change detection
- πͺ Git Hooks - Automatic indexing after merge, checkout, and rebase
- π₯ Self-Diagnostic - Built-in health checks with auto-fix capabilities
- π€ MCP Server - Model Context Protocol server for AI assistant integration
- π¦ Zero Dependencies - Minimal runtime dependencies
- Node.js 20 or higher
- npm or yarn
``bash`
npm install -g @squirrelsoft/code-index
`bash`
npm install --save-dev @squirrelsoft/code-index
1. Initialize code-index in your project:
`bash`
code-index init
This creates:
- .codeindex/ - Database, AST files, and logs directory.claude/
- - Configuration directory for Claude integration.mcp.json
- - Model Context Protocol configuration.gitignore
- Updates to exclude code-index artifacts
2. Index your codebase:
`bash`
code-index index
3. Search for code:
`bashText search
code-index search "function handleUser"
Commands
$3
Initialize code-index in your project.
Options:
-
--force - Reinitialize and overwrite existing configuration
- --json - Output results in JSON formatExample:
`bash
code-index init --force
`$3
Build or rebuild the search index for your codebase.
Options:
-
-v, --verbose - Show detailed progress information
- -j, --json - Output results in JSON formatExample:
`bash
code-index index --verbose
`$3
Search the indexed codebase for patterns.
Options:
-
-r, --regex - Treat query as regular expression
- -c, --case-sensitive - Perform case-sensitive search
- -l, --limit - Limit number of results (default: 20)
- -j, --json - Output results in JSON format
- --hybrid - Enable hybrid search (combines lexical + semantic ranking)
- --alpha - Lexical weight for hybrid search (0.0-1.0, default: 0.5)
- --beta - Vector weight for hybrid search (0.0-1.0, default: 0.4)
- --gamma - Tie-breaker weight for hybrid search (0.0-1.0, default: 0.1)
- --config - Custom ranking config file path
- --lexical-only - Use only lexical (BM25) search
- --vector-only - Use only vector (semantic) search
- --no-diversification - Disable path diversification in results
- --explain - Show detailed score breakdown for each resultExamples:
`bash
Simple text search
code-index search "TODO"Regex pattern search
code-index search --regex "class.*Controller"Limit results
code-index search "import" --limit 10JSON output for scripting
code-index search "error" --json | jq '.results[].path'Hybrid search (combines lexical + semantic)
code-index search --hybrid "user authentication"Adjust fusion weights (prioritize exact matches)
code-index search --hybrid "API endpoint" --alpha 0.7 --beta 0.3Show detailed score explanations
code-index search --hybrid "error handling" --explainUse custom ranking configuration
code-index search --hybrid "database query" --config ./my-ranking-config.json
`$3
Update the index for modified files only (incremental update).
Options:
-
-v, --verbose - Show detailed progress information
- -j, --json - Output results in JSON formatExample:
`bash
code-index refresh --verbose
`$3
Diagnose system health and suggest fixes.
Options:
-
-v, --verbose - Show detailed diagnostic information
- -f, --fix - Attempt to automatically fix issues
- -j, --json - Output results in JSON formatExample:
`bash
Check system health
code-index doctorAuto-fix issues
code-index doctor --fix
`$3
Watch file system for changes and automatically update the index in real-time.
Options:
-
--delay - Debounce delay in milliseconds (default: 500)
- --batch-size - Number of files to process per batch (default: 100)
- --ignore - Additional patterns to ignore
- --max-depth - Limit directory recursion depth
- --extensions - Comma-separated list of file extensions to watch
- -v, --verbose - Show detailed progress information
- -j, --json - Output results in JSON formatExamples:
`bash
Start watching with default settings
code-index watchWatch with custom debounce delay
code-index watch --delay 1000Watch only specific file types
code-index watch --extensions js,ts,pyWatch with depth limit (good for large projects)
code-index watch --max-depth 5 --ignore "test/*"
`$3
Manage Git hooks for automatic indexing after Git operations.
Subcommands:
-
install - Install Git hooks
- uninstall - Remove Git hooks
- status - Show hook installation statusOptions:
-
--hooks - Comma-separated list of hooks to install (post-merge, post-checkout, post-rewrite)
- --force - Force reinstall hooks
- -j, --json - Output results in JSON formatExamples:
`bash
Install all hooks (recommended)
code-index hooks installInstall specific hooks
code-index hooks install --hooks post-merge,post-checkoutCheck hook status
code-index hooks statusRemove all hooks
code-index hooks uninstall
`$3
Run comprehensive diagnostics and suggest fixes for common issues.
Options:
-
--fix - Attempt to automatically fix detected issues
- --report - Generate a full diagnostic report
- -v, --verbose - Show detailed diagnostic information
- -j, --json - Output results in JSON formatExamples:
`bash
Check system health
code-index diagnoseAuto-fix detected issues
code-index diagnose --fixGenerate diagnostic report
code-index diagnose --report
`$3
View collected search performance metrics.
Options:
-
--json - Output metrics in JSON format
- --log-dir - Path to logs directory (default: .codeindex/logs)Examples:
`bash
View performance statistics
code-index metricsExport metrics as JSON
code-index metrics --json > performance-report.json
`$3
Start the MCP (Model Context Protocol) server for code intelligence. The server listens on stdio for JSON-RPC 2.0 requests and provides 8 tool functions for AI assistants to navigate and understand codebases.
Available Tools:
-
search - Hybrid semantic + lexical search across codebase
- find_def - Find symbol definitions with exact name matching (fast path via symbol index)
- find_refs - Find all references to a symbol (imports, exports, calls)
- callers - Find all functions/methods that call a given function
- callees - Find all functions/methods called by a given function
- open_at - Open file at specific line with context
- refresh - Incremental index update (automatically reloads symbol index)
- symbols - List all symbols in a file or entire codebaseSymbol Index Features:
- Exact symbol name matching (O(1) lookup via hash map)
- Prefix, substring, and fuzzy matching (k-gram indexed)
- Full AST information including signatures, call graphs, and line ranges
- Automatically populated on server start from persisted AST files
- Re-initialized after refresh operations for immediate availability
Options:
-
-p, --project - Project root directory (defaults to current directory)Environment Variables:
-
CODE_INDEX_AUTH_TOKEN - Optional authentication token (when set, clients must provide matching token)Examples:
`bash
Start MCP server (requires indexed codebase)
code-index serveStart with authentication
CODE_INDEX_AUTH_TOKEN=secret code-index serveStart for specific project
code-index serve --project /path/to/project
`Integration:
Create an
.mcp.json file in your project to configure MCP clients:`json
{
"mcpServers": {
"code-index": {
"command": "code-index",
"args": ["serve"],
"env": {
"CODE_INDEX_AUTH_TOKEN": ""
}
}
}
}
`For Claude Code integration, the server will be automatically detected and available in the tool picker.
Notes:
- Server uses stdio transport (stdin/stdout for JSON-RPC messages)
- All responses include file anchors (
file:line:col) with precise symbol locations
- Code previews extracted from source files or AST spans
- Symbol definitions include full metadata: signatures, call graphs, line ranges, and kind (function, class, interface, etc.)
- Symbol index loads on first request (~100ms for 1000 files)
- Supports concurrent requests (handles 50+ simultaneous queries)
- Gracefully handles SIGTERM/SIGINT for clean shutdown with in-flight request completion$3
Remove all code-index artifacts from your project.
Options:
-
-y, --yes - Skip confirmation prompt
- -v, --verbose - Show detailed removal information
- -j, --json - Output results in JSON formatExample:
`bash
code-index uninstall --yes
`Configuration
$3
Code-index automatically respects your
.gitignore patterns. Files and directories listed in .gitignore will not be indexed.$3
By default, files larger than 10MB are skipped during indexing to maintain performance.
$3
Code-index automatically detects and tags files with their programming language based on extension:
- JavaScript/TypeScript (
.js, .jsx, .ts, .tsx) - AST parsing with Tree-sitter
- Python (.py) - AST parsing with Tree-sitter
- Java (.java)
- C/C++ (.c, .cpp, .h, .hpp)
- Go (.go)
- Rust (.rs)
- And 40+ more languagesAST Parsing:
Languages with AST parsing get full symbol extraction including:
- Function/method definitions with signatures
- Class/interface/type definitions
- Import/export statements
- Call graph relationships (caller/callee tracking)
- Precise line/column location spans
Symbol Index
The symbol index provides fast, precise symbol lookup for code navigation. Built on Tree-sitter parsing and k-gram indexing, it enables instant "go to definition" and call graph traversal.
$3
- Exact Matching - O(1) hash-based lookup for symbol names
- Fuzzy Matching - K-gram indexed prefix, substring, and edit-distance matching
- Full Metadata - Function signatures, line ranges, call graphs
- Multiple Symbol Types - Functions, classes, interfaces, types, enums, constants, components
- Call Graphs - Bidirectional tracking (callers β callees)
- Import/Export Tracking - Full dependency graph
$3
| Type | Languages | Metadata |
|------|-----------|----------|
| Functions | TS/JS, Python | Signature, parameters, return type, calls, called_by |
| Classes | TS/JS, Python | Methods, properties, extends |
| Interfaces | TS/JS | Properties, methods, extends |
| Type Aliases | TS/JS | Type definition |
| Enums | TS/JS | Members |
| Constants | TS/JS, Python | Type, value |
| Components | TS/JS/JSX | Props, hooks |
$3
`bash
Start MCP server
code-index serveFrom AI assistant (e.g., Claude Code):
Find definition
mcp__code-index__find_def(symbol: "findPackageRoot")Find who calls this function
mcp__code-index__callers(symbol: "findPackageRoot")Find what this function calls
mcp__code-index__callees(symbol: "findPackageRoot")List all symbols in file
mcp__code-index__symbols(path: "src/services/indexer.ts")
`$3
- Exact Match: <10ms
- Fuzzy Match: <50ms (1000 symbols)
- Load Time: ~100ms for 1000 files
- Memory: ~1MB per 1000 symbols
Hybrid Search
Hybrid search combines the precision of lexical search (BM25) with the semantic understanding of vector search to deliver superior code search results.
$3
1. Dual Retrieval: Fetches top-200 candidates from both lexical (exact/fuzzy text matches) and vector (semantic similarity) components in parallel
2. Fusion: Combines rankings using Reciprocal Rank Fusion (RRF) with configurable weights (Ξ±, Ξ², Ξ³)
3. Diversification: Applies path-based diversification to ensure results span multiple files/directories
4. Tie-Breaking: Uses advanced heuristics (symbol type, path priority, language match) to order similarly-scored results
$3
`bash
Basic hybrid search
code-index search --hybrid "authentication logic"Prioritize exact matches (increase lexical weight)
code-index search --hybrid "JWT token" --alpha 0.7 --beta 0.3Prioritize semantic matches (increase vector weight)
code-index search --hybrid "how to handle errors" --alpha 0.3 --beta 0.6Explain rankings
code-index search --hybrid "database connection" --explain
`$3
Create
.codeindex/ranking-config.json to customize hybrid search behavior:`json
{
"version": "1.0",
"fusion": {
"alpha": 0.5, // Lexical weight (exact text matches)
"beta": 0.4, // Vector weight (semantic similarity)
"gamma": 0.1, // Tie-breaker weight
"rrfK": 60 // RRF constant (higher = less impact of rank position)
},
"diversification": {
"enabled": true,
"lambda": 0.7, // 0.0 = max diversity, 1.0 = pure relevance
"maxPerFile": 3 // Max results from single file in top-10
},
"tieBreakers": {
"symbolTypeWeight": 0.3, // Prioritize functions/classes
"pathPriorityWeight": 0.3, // Prioritize src/ over tests/
"languageMatchWeight": 0.2, // Match query language context
"identifierMatchWeight": 0.2 // Exact identifier matches
},
"performance": {
"candidateLimit": 200, // Candidates per source
"timeoutMs": 300, // SLA target
"earlyTerminationTopK": 10
}
}
`The configuration file supports hot-reloadβchanges take effect immediately without restarting.
$3
- Latency: <300ms for top-10 results on medium repos (10k-50k files)
- Memory: <500MB typical usage
- Throughput: Supports 100 concurrent searches
$3
Use Hybrid Search When:
- Looking for concepts ("error handling patterns")
- Exploring unfamiliar codebases
- Natural language queries ("how to validate user input")
- You want both exact matches AND related code
Use Lexical Search When:
- Searching for specific symbols or strings
- Regex pattern matching
- Performance is critical (<100ms requirement)
- You know exact identifiers or keywords
Performance
- Indexing Speed: 1,000+ files/second
- Search Response: <100ms for codebases under 100k files
- Memory Usage: <500MB for 1M lines of code
- Database Size: ~10% of indexed code size
Data Storage
All data is stored locally in your project:
`
.codeindex/
βββ index.db # SQLite database with hybrid index (embeddings + sparse vectors)
βββ ast/ # Parsed AST files (JSON) - one per source file
β βββ *.json # Symbol definitions, call graphs, imports/exports
βββ models/ # ONNX embedding models for semantic search
β βββ gte-small.onnx
βββ logs/ # JSON lines log files
βββ mcp-server.log # MCP server activity
βββ search-performance.jsonl # Search metrics
`AST Files:
- One JSON file per source file, named using encoded path (e.g.,
src_services_indexer.ts.json)
- Contains extracted symbols: functions, classes, interfaces, types, enums, constants, components
- Includes call graphs (what each function calls and is called by)
- Import/export tracking for dependency analysis
- Precise source location spans (line/column ranges)Integration
$3
Code-index is designed to work seamlessly with Claude.ai through the Model Context Protocol (MCP). The
.claude/ directory structure enables:- Custom settings and preferences
- Hooks for code analysis
- Tool integrations
$3
`yaml
GitHub Actions example
- name: Index codebase
run: |
npm install -g @squirrelsoft/code-index
code-index init
code-index index
code-index search "TODO" --json > todos.json
`Troubleshooting
$3
1. "Project not initialized"
- Run
code-index init first2. "Database corrupted"
- Run
code-index doctor --fix
- Or reinitialize: code-index init --force3. "Permission denied"
- Check file permissions for
.codeindex/ directory
- Run code-index doctor for specific issues4. Slow indexing
- Check available disk space
- Ensure no antivirus is scanning
.codeindex/
- Consider excluding large binary files$3
Set the
DEBUG environment variable for detailed logging:`bash
DEBUG=code-index code-index index
`$3
1. "Symbol not found" after indexing
- The MCP server needs to be restarted to load new symbols
- Or use the
refresh MCP tool which automatically reloads the symbol index2. MCP server logs location
- Check
.codeindex/logs/mcp-server.log for server activity
- Logs include symbol index statistics on load3. Symbol index not loading
- Ensure
.codeindex/ast/ directory contains JSON files
- Check MCP logs for "Symbol index loaded: X symbols from Y files"
- Verify files were indexed with code-index indexContributing
Contributions are welcome! Please see our Contributing Guide for details.
License
MIT Β© [Squirrel Software]
Changelog
$3
- New Features:
- π€ MCP Server - Model Context Protocol server for AI assistant integration
- π³ AST-Based Symbol Index - Tree-sitter powered parsing with k-gram indexing
- 8 tool functions: search, find_def, find_refs, callers, callees, open_at, refresh, symbols
- All responses include file anchors (file:line:col) and code previews
- Optional authentication via CODE_INDEX_AUTH_TOKEN environment variable
- Concurrent request handling (50+ simultaneous queries)
- Graceful shutdown with cleanup
- Auto-reload symbol index after refresh operations
- Symbol Index:
- In-memory symbol index for O(1) exact matching
- K-gram indexing for prefix, substring, and fuzzy search
- Full AST metadata: signatures, call graphs, line ranges
- Supports functions, classes, interfaces, types, enums, constants, components
- Automatic population from persisted AST files on server start
- Commands:
- serve - Start MCP server on stdio transport
- Integration:
- .mcp.json configuration file support
- Claude Code tool picker integration
- VSCode-compatible file anchors
- Auto-updates .gitignore during init
- Performance:
- Symbol lookup <10ms (exact match)
- Search <500ms for <100k files
- Symbol index load ~100ms for 1000 files
- Prepared statement caching for optimal performance
- WAL mode for concurrent reads$3
- New Features:
- π§ Hybrid Search - Combines BM25 lexical + vector semantic search with RRF fusion
- Configurable ranking weights (Ξ±, Ξ², Ξ³) via CLI flags or config file
- Path diversification (MMR-style) for better result distribution
- Advanced tie-breaking using symbol type, path priority, and language matching
- Performance monitoring with JSON lines logging
- Hot-reloadable configuration file (.codeindex/ranking-config.json)
- Commands:
- metrics - View aggregated search performance statistics
- Search Options:
- --hybrid - Enable hybrid search mode
- --alpha/--beta/--gamma - Adjust fusion weights
- --lexical-only/--vector-only - Use single search component
- --no-diversification - Disable path diversification
- --explain - Show detailed score breakdown
- --config - Use custom ranking configuration
- Performance:
- <300ms p95 latency for hybrid search on medium repos (10k-50k files)
- Parallel candidate retrieval for optimal performance
- Early termination and prepared statements
- Observability:
- Performance metrics logged to .codeindex/logs/search-performance.jsonl
- SLA violation tracking and warnings
- Fallback mode detection and reporting$3
- New Features:
- File watcher with real-time index updates
- Debounced change detection (500ms default)
- Git hooks support (post-merge, post-checkout, post-rewrite)
- Enhanced diagnostics command with auto-fix
- Performance benchmarking utilities
- Telemetry collection (respects privacy)
- Commands:
- watch - Real-time file system monitoring
- hooks - Git hook management
- diagnose` - Comprehensive system diagnostics- GitHub Issues: Report bugs or request features
- Documentation: Full documentation