Semantic code search and context retrieval MCP server for large codebases
npm install @gianged/cindexSemantic code search and context retrieval for large codebases
A Model Context Protocol (MCP) server that provides intelligent code search and context retrieval
for Claude Code. Handles 1M+ lines of code with accuracy-first design.
- Semantic Search - Vector embeddings for intelligent code discovery
- Hybrid Search - Combines vector similarity with PostgreSQL full-text search for better natural
language query handling
- 9-Stage Retrieval Pipeline - Scope filtering → query → files → chunks → symbols → imports →
APIs → dedup → assembly
- Multi-Project Support - Monorepo, microservices, and reference repository indexing
- Scope Filtering - Global, repository, service, and boundary-aware search modes
- API Contract Search - Semantic search for REST/GraphQL/gRPC endpoints
- Query Caching - LRU cache with 80%+ hit rate (cached queries ~50ms)
- Progress Notifications - Real-time 9-stage pipeline tracking
- Incremental Indexing - Only re-index changed files
- Import Chain Analysis - Automatic dependency resolution
- Deduplication - Remove duplicate utility functions
- Large Codebase Support - Efficiently handles 1M+ LoC
- Claude Code Integration - Native MCP server with 17 tools
- Accuracy-First - Default settings optimized for relevance
- Configurable Models - Swap embedding/LLM models via env vars
- Indexing Speed: 300-600 files/min (with LLM summaries)
- Query Speed: First query ~800ms, cached queries ~50ms
- Cache Hit Rate: 80%+ for repeated queries
- Codebase Scale: Efficiently handles 1M+ lines of code
- Memory Efficient: LRU caching with configurable limits
- Real-Time Progress: 9-stage pipeline notifications
12 languages with full tree-sitter parsing: TypeScript, JavaScript, Python, Java, Go, Rust, C,
C++, C#, PHP, Ruby, Kotlin. Swift and other languages use regex fallback parsing.
Before installing cindex, you need:
PostgreSQL 16+ with pgvector extension for vector similarity search:
``bashUbuntu/Debian
sudo apt install postgresql-16 postgresql-16-pgvector
$3
Ollama for local LLM inference with two models:
Embedding Model (for vector generation):
`bash
Install Ollama
curl https://ollama.ai/install.sh | shPull embedding model (bge-m3:567m recommended)
ollama pull bge-m3:567m
`Coding Model (for file summaries and analysis):
`bash
Pull coding model (qwen2.5-coder:7b recommended)
ollama pull qwen2.5-coder:7bAlternative for faster indexing (lower quality):
ollama pull qwen2.5-coder:1.5b
`Model Options:
- Embedding: bge-m3:567m (1024 dims, 8K context) - Best accuracy
- Summary: qwen2.5-coder:7b (32K context) - High quality, RTX 4060+ recommended
- Summary: qwen2.5-coder:3b (32K context) - Balanced
- Summary: qwen2.5-coder:1.5b (32K context) - Fast indexing, lower quality
Installation
$3
Create and initialize the cindex database:
`bash
Create database
createdb cindex_rag_codebaseInitialize schema (after installing cindex - see next section)
`$3
Add cindex to Claude Code using the CLI. You can install for personal use (user scope) or share with
your team (project scope).
#### Quick Install (Personal Use)
Install for all your projects:
`bash
claude mcp add cindex --scope user --transport stdio \
--env POSTGRES_PASSWORD="your_password" \
-- npx -y @gianged/cindex
`#### Team Install (Shared via Git)
Install for the current project (creates
.mcp.json in project root):`bash
claude mcp add cindex --scope project --transport stdio \
--env POSTGRES_PASSWORD="your_password" \
-- npx -y @gianged/cindex
`Note: For project scope, set
POSTGRES_PASSWORD as an environment variable on your system and
reference it in the command. Never commit actual secrets to version control.#### Custom Configuration
Add additional environment variables using multiple
--env flags:`bash
claude mcp add cindex --scope user --transport stdio \
--env POSTGRES_PASSWORD="your_password" \
--env POSTGRES_HOST="localhost" \
--env POSTGRES_DB="cindex_rag_codebase" \
--env EMBEDDING_MODEL="bge-m3:567m" \
--env SUMMARY_MODEL="qwen2.5-coder:7b" \
-- npx -y @gianged/cindex
`See Environment Variables section below for all available configuration
options.
#### Manual Configuration (Alternative)
If you prefer to manually edit configuration files, you can add cindex to:
User Scope (
~/.claude.json):`json
{
"mcpServers": {
"cindex": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@gianged/cindex"],
"env": {
"POSTGRES_PASSWORD": "your_password"
}
}
}
}
`Project Scope (
.mcp.json in project root):`json
{
"mcpServers": {
"cindex": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@gianged/cindex"],
"env": {
"POSTGRES_HOST": "${POSTGRES_HOST:-localhost}",
"POSTGRES_PORT": "${POSTGRES_PORT:-5432}",
"POSTGRES_DB": "${POSTGRES_DB:-cindex_rag_codebase}",
"POSTGRES_USER": "${POSTGRES_USER:-postgres}",
"POSTGRES_PASSWORD": "${POSTGRES_PASSWORD}"
}
}
}
}
`$3
After configuring MCP, initialize the database schema:
`bash
Download schema file
curl -o database.sql https://raw.githubusercontent.com/gianged/cindex/main/database.sqlApply schema
psql cindex_rag_codebase < database.sql
`$3
1. Open Claude Code
2. Use the
index_repository tool to index your codebase
3. Use search_codebase to find relevant codeEnvironment Variables
All configuration is done through environment variables in your MCP config file.
$3
| Variable | Default | Range | Description |
| -------------------------- | ------------------------ | ----------- | -------------------------------------------- |
|
EMBEDDING_MODEL | bge-m3:567m | - | Ollama embedding model for vector generation |
| EMBEDDING_DIMENSIONS | 1024 | 1-4096 | Vector dimensions (must match model output) |
| EMBEDDING_CONTEXT_WINDOW | 4096 | 512-131072 | Token limit for embedding model |
| SUMMARY_MODEL | qwen2.5-coder:7b | - | Ollama model for file summaries |
| SUMMARY_CONTEXT_WINDOW | 4096 | 512-131072 | Token limit for summary model |
| OLLAMA_HOST | http://localhost:11434 | - | Ollama API endpoint |
| OLLAMA_TIMEOUT | 30000 | 1000-300000 | Request timeout in milliseconds |Context Window Notes:
- Default 4096 matches Ollama's default and is sufficient (cindex uses first 100 lines per file)
- Higher values = more VRAM usage + slower inference
- qwen2.5-coder:7b supports up to 32K tokens
- bge-m3:567m supports up to 8K tokens
- Increase only if you encounter issues with large files
$3
| Variable | Default | Range | Description |
| -------------------------- | --------------------- | ------- | ------------------------------- |
|
POSTGRES_HOST | localhost | - | PostgreSQL server hostname |
| POSTGRES_PORT | 5432 | 1-65535 | PostgreSQL server port |
| POSTGRES_DB | cindex_rag_codebase | - | Database name |
| POSTGRES_USER | postgres | - | Database user |
| POSTGRES_PASSWORD | _required_ | - | Database password (must be set) |
| POSTGRES_MAX_CONNECTIONS | 10 | 1-100 | Maximum connection pool size |$3
| Variable | Default | Range | Description |
| ---------------------------- | ------- | ------- | ---------------------------------------------------- |
|
HNSW_EF_SEARCH | 300 | 10-1000 | HNSW search quality (higher = more accurate, slower) |
| HNSW_EF_CONSTRUCTION | 200 | 10-1000 | HNSW index quality (higher = better index) |
| SIMILARITY_THRESHOLD | 0.3 | 0.0-1.0 | Minimum similarity for file-level retrieval |
| CHUNK_SIMILARITY_THRESHOLD | 0.2 | 0.0-1.0 | Minimum similarity for chunk-level retrieval |
| DEDUP_THRESHOLD | 0.92 | 0.0-1.0 | Similarity threshold for deduplication |
| HYBRID_VECTOR_WEIGHT | 0.7 | 0.0-1.0 | Weight for vector similarity in hybrid search |
| HYBRID_KEYWORD_WEIGHT | 0.3 | 0.0-1.0 | Weight for keyword (BM25) score in hybrid search |
| IMPORT_DEPTH | 3 | 1-10 | Maximum import chain traversal depth |
| WORKSPACE_DEPTH | 2 | 1-10 | Maximum workspace dependency depth |
| SERVICE_DEPTH | 1 | 1-10 | Maximum service dependency depth |$3
| Variable | Default | Range | Description |
| ------------------ | ------- | ---------- | ---------------------------------- |
|
MAX_FILE_SIZE | 5000 | 100-100000 | Maximum file size in lines |
| INCLUDE_MARKDOWN | false | true/false | Include markdown files in indexing |$3
| Variable | Default | Range | Description |
| ------------------------------- | ------- | ---------- | --------------------------------------- |
|
ENABLE_WORKSPACE_DETECTION | true | true/false | Detect monorepo workspaces |
| ENABLE_SERVICE_DETECTION | true | true/false | Detect microservices |
| ENABLE_MULTI_REPO | false | true/false | Enable multi-repository support |
| ENABLE_API_ENDPOINT_DETECTION | true | true/false | Parse API contracts (REST/GraphQL/gRPC) |
| ENABLE_HYBRID_SEARCH | true | true/false | Combine vector + full-text search |Example Configurations
$3
Only the required password:
`json
{
"mcpServers": {
"cindex": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@gianged/cindex"],
"env": {
"POSTGRES_PASSWORD": "your_password"
}
}
}
}
`$3
All available settings with defaults shown:
`json
{
"mcpServers": {
"cindex": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@gianged/cindex"],
"env": {
"EMBEDDING_MODEL": "bge-m3:567m",
"EMBEDDING_DIMENSIONS": "1024",
"EMBEDDING_CONTEXT_WINDOW": "4096",
"SUMMARY_MODEL": "qwen2.5-coder:7b",
"SUMMARY_CONTEXT_WINDOW": "4096",
"OLLAMA_HOST": "http://localhost:11434",
"POSTGRES_HOST": "localhost",
"POSTGRES_PORT": "5432",
"POSTGRES_DB": "cindex_rag_codebase",
"POSTGRES_USER": "postgres",
"POSTGRES_PASSWORD": "your_password",
"HNSW_EF_SEARCH": "300",
"HNSW_EF_CONSTRUCTION": "200",
"SIMILARITY_THRESHOLD": "0.3",
"CHUNK_SIMILARITY_THRESHOLD": "0.2",
"DEDUP_THRESHOLD": "0.92"
}
}
}
}
`$3
For faster indexing with lower quality:
`json
{
"mcpServers": {
"cindex": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@gianged/cindex"],
"env": {
"POSTGRES_PASSWORD": "your_password",
"SUMMARY_MODEL": "qwen2.5-coder:1.5b",
"SUMMARY_CONTEXT_WINDOW": "4096",
"HNSW_EF_SEARCH": "100",
"HNSW_EF_CONSTRUCTION": "64",
"SIMILARITY_THRESHOLD": "0.4",
"CHUNK_SIMILARITY_THRESHOLD": "0.25",
"DEDUP_THRESHOLD": "0.95"
}
}
}
}
`Performance:
- Indexing: 500-1000 files/min (vs 300-600 files/min default)
- Query Time: <500ms (vs <800ms default)
- Relevance: >85% in top 10 results (vs >92% default)
Recommended Settings
$3
| Setting | Value | Notes |
| ---------------------------- | ------------------ | ---------------------------------- |
|
EMBEDDING_MODEL | bge-m3:567m | Best accuracy/speed balance |
| SUMMARY_MODEL | qwen2.5-coder:7b | Good summaries, fits in VRAM |
| EMBEDDING_CONTEXT_WINDOW | 4096 | Default, sufficient for most files |
| HNSW_EF_SEARCH | 300 | High accuracy retrieval |
| SIMILARITY_THRESHOLD | 0.3 | File-level retrieval threshold |
| CHUNK_SIMILARITY_THRESHOLD | 0.2 | Chunk-level retrieval threshold |
| DEDUP_THRESHOLD | 0.92 | Prevent duplicate results |$3
- Indexing: ~30 files/min (~70 chunks/min)
- Search: <1 second per query
- Codebase: Tested with 40k LoC (112 files)
Managing Configuration
$3
List all installed MCP servers:
`bash
claude mcp list
`View cindex configuration:
`bash
claude mcp get cindex
`$3
To update environment variables, remove and re-add with new settings:
`bash
claude mcp remove cindex
claude mcp add cindex --scope user --transport stdio \
--env POSTGRES_PASSWORD="your_password" \
--env SUMMARY_MODEL="qwen2.5-coder:3b" \
-- npx -y @gianged/cindex
`$3
For faster indexing with lower quality, use these settings:
`bash
claude mcp remove cindex
claude mcp add cindex --scope user --transport stdio \
--env POSTGRES_PASSWORD="your_password" \
--env SUMMARY_MODEL="qwen2.5-coder:1.5b" \
--env HNSW_EF_SEARCH="100" \
--env HNSW_EF_CONSTRUCTION="64" \
--env SIMILARITY_THRESHOLD="0.4" \
--env CHUNK_SIMILARITY_THRESHOLD="0.25" \
--env DEDUP_THRESHOLD="0.95" \
-- npx -y @gianged/cindex
`Performance:
- Indexing: 500-1000 files/min (vs 300-600 files/min default)
- Query Time: <500ms (vs <800ms default)
- Relevance: >85% in top 10 results (vs >92% default)
$3
`bash
claude mcp remove cindex
`MCP Tools
Status: 17 of 17 tools implemented
All tools provide structured output with syntax highlighting and comprehensive metadata.
$3
####
search_codebaseSemantic code search with multi-stage retrieval and dependency analysis.
Parameters:
-
query (required) - Natural language search query
- scope - Search scope: 'global', 'repository', 'service', or 'workspace'
- repo_id - Filter by repository ID
- service_id - Filter by service ID
- workspace_id - Filter by workspace ID
- max_results - Maximum results (1-100, default: 20)
- similarity_threshold - Minimum similarity (0.0-1.0, default: 0.75)
- include_dependencies - Include imported dependencies (default: false)Returns: Markdown-formatted results with file paths, line numbers, code snippets, and relevance
scores.
####
get_file_contextGet complete context for a specific file including callers, callees, and import chain.
Parameters:
-
file_path (required) - Absolute or relative file path
- repo_id - Repository ID (optional if file path is unique)
- include_callers - Include functions that call this file (default: true)
- include_callees - Include functions called by this file (default: true)
- include_imports - Include import chain (default: true)
- max_depth - Import chain depth (1-5, default: 2)Returns: File summary, symbols, dependencies, and related code context.
####
find_symbol_definitionLocate symbol definitions and optionally show usages across the codebase.
Parameters:
-
symbol_name (required) - Function, class, or variable name
- repo_id - Filter by repository ID
- file_path - Filter by file path
- symbol_type - Filter by type: 'function', 'class', 'variable', 'interface', etc.
- include_usages - Show where symbol is used (default: false)
- max_usages - Maximum usage results (1-100, default: 50)Returns: Symbol definitions with file paths, line numbers, signatures, and optional usage
locations.
$3
####
index_repositoryIndex or re-index a repository with progress notifications and multi-project support.
Parameters:
-
repo_path (required) - Absolute path to repository root
- repo_id - Repository identifier (default: directory name)
- repo_type - Repository type: 'monolithic', 'microservice', 'monorepo', 'library',
'reference', or 'documentation'
- force_reindex - Force full re-index (default: false, uses incremental indexing)
- detect_workspaces - Detect monorepo workspaces (default: true)
- detect_services - Detect microservices (default: true)
- detect_api_endpoints - Parse API contracts (default: true)
- service_config - Manual service configuration (optional)
- version - Repository version for reference repos (e.g., 'v10.3.0')
- metadata - Additional metadata (e.g., { upstream_url: '...' })Returns: Indexing statistics including files indexed, chunks created, symbols extracted,
workspaces/services detected, and timing information.
####
delete_repositoryDelete one or more indexed repositories and all associated data.
Parameters:
-
repo_ids (required) - Array of repository IDs to deleteReturns: Deletion confirmation with statistics (files, chunks, symbols, workspaces, services
removed).
####
list_indexed_reposList all indexed repositories with optional metadata, workspace counts, and service counts.
Parameters:
-
include_metadata - Include repository metadata (default: true)
- include_workspace_count - Include workspace count for monorepos (default: true)
- include_service_count - Include service count for microservices (default: true)
- repo_type_filter - Filter by repository typeReturns: List of repositories with IDs, types, file counts, last indexed time, and optional
metadata.
$3
####
list_workspacesList all workspaces in indexed repositories for monorepo support.
Parameters:
-
repo_id - Filter by repository ID (optional)
- include_dependencies - Include dependency information (default: false)
- include_metadata - Include package.json metadata (default: false)Returns: List of workspaces with package names, paths, file counts, and optional dependencies.
####
get_workspace_contextGet full context for a workspace including dependencies and dependents.
Parameters:
-
workspace_id - Workspace ID (use list_workspaces to find)
- package_name - Package name (alternative to workspace_id)
- repo_id - Repository ID (required if using package_name)
- include_dependencies - Include workspace dependencies (default: true)
- include_dependents - Include workspaces that depend on this one (default: true)
- dependency_depth - Dependency tree depth (1-5, default: 2)Returns: Workspace metadata, dependency tree, dependent workspaces, and file list.
####
find_cross_workspace_usagesFind workspace package usages across the monorepo.
Parameters:
-
workspace_id - Source workspace ID
- package_name - Source package name (alternative to workspace_id)
- symbol_name - Specific symbol to track (optional)
- include_indirect - Include indirect usages (default: false)
- max_depth - Dependency chain depth (1-5, default: 2)Returns: List of workspaces using the target package/symbol with file locations.
$3
####
list_servicesList all services across indexed repositories for microservice support.
Parameters:
-
repo_id - Filter by repository ID (optional)
- service_type - Filter by type: 'docker', 'serverless', 'mobile' (optional)
- include_dependencies - Include service dependencies (default: false)
- include_api_endpoints - Include API endpoint counts (default: false)Returns: List of services with IDs, names, types, file counts, and optional API information.
####
get_service_contextGet full context for a service including API contracts and dependencies.
Parameters:
-
service_id - Service ID (use list_services to find)
- service_name - Service name (alternative to service_id)
- repo_id - Repository ID (required if using service_name)
- include_dependencies - Include service dependencies (default: true)
- include_dependents - Include services that depend on this one (default: true)
- include_api_contracts - Include API endpoint definitions (default: true)
- dependency_depth - Dependency tree depth (1-5, default: 1)Returns: Service metadata, API contracts (REST/GraphQL/gRPC), dependency graph, and file list.
####
find_cross_service_callsFind inter-service API calls across microservices.
Parameters:
-
source_service_id - Source service ID (optional)
- target_service_id - Target service ID (optional)
- endpoint_pattern - Endpoint regex pattern (e.g., /api/users/.*, optional)
- include_reverse - Also show calls in reverse direction (default: false)Returns: List of inter-service API calls with endpoints, HTTP methods, and call counts.
$3
####
search_api_contractsSearch API endpoints across services with semantic understanding.
Parameters:
-
query (required) - API search query (e.g., "user authentication endpoint")
- api_types - Filter by type: ['rest', 'graphql', 'grpc'] (default: all)
- service_filter - Filter by service IDs (optional)
- repo_filter - Filter by repository IDs (optional)
- include_deprecated - Include deprecated endpoints (default: false)
- max_results - Maximum results (1-100, default: 20)
- similarity_threshold - Minimum similarity (0.0-1.0, default: 0.70)Returns: API endpoints with paths, HTTP methods, service names, implementation files, and
similarity scores.
$3
Tools for searching reference materials including markdown documentation (syntax references,
Context7-fetched docs) AND reference repository code (indexed frameworks/libraries).
####
index_documentationIndex markdown files for documentation search. Works with explicit paths only.
Parameters:
-
paths (required) - Array of file or directory paths to index (e.g.,
['syntax.md', '/docs/libraries/'])
- doc_id - Document identifier (default: derived from path)
- tags - Tags for filtering (e.g., ['typescript', 'react'])
- force_reindex - Force re-index even if unchanged (default: false)Returns: Indexing statistics including files indexed, sections created, code blocks extracted,
and timing.
Workflow:
1. Fetch documentation (e.g., from Context7)
2. Save to markdown file
3. Index with
index_documentation
4. Search with search_references####
search_referencesSearch reference materials including markdown documentation AND reference repository code. Combines
both sources for comprehensive reference search.
Parameters:
-
query (required) - Natural language search query
- doc_ids - Filter by document IDs (optional)
- tags - Filter by documentation tags (optional)
- include_docs - Include markdown documentation results (default: true)
- include_code - Include reference repository code results (default: true)
- max_results - Maximum results per source (1-50, default: 10)
- include_code_blocks - Include code blocks from documentation (default: true)
- similarity_threshold - Minimum similarity (0.0-1.0, default: 0.65)Returns: Combined results from both documentation chunks and reference repository code, with
heading breadcrumbs, content snippets, code blocks, file paths, and relevance scores.
Note: Reference repositories are indexed using
index_repository with repo_type: 'reference'.
They are excluded from search_codebase by default and only searchable via search_references.####
list_documentationList all indexed documentation with metadata.
Parameters:
-
doc_ids - Filter by document IDs (optional)
- tags - Filter by tags (optional)Returns: List of indexed documents with file counts, section counts, code block counts, and
indexed timestamps.
####
delete_documentationDelete indexed documentation by document ID.
Parameters:
-
doc_ids (required) - Array of document IDs to deleteReturns: Deletion confirmation with chunks and files removed.
---
See docs/overview.md for complete tool documentation including
multi-project/monorepo/microservice architecture details.
Architecture
$3
Combines vector similarity search with PostgreSQL full-text search (tsvector/ts_rank_cd) for
improved natural language query handling:
`
hybrid_score = (0.7 vector_similarity) + (0.3 keyword_score)
`- Vector search - Semantic understanding via embeddings
- Keyword search - Exact term matching via PostgreSQL full-text search
- Configurable weights via
HYBRID_VECTOR_WEIGHT and HYBRID_KEYWORD_WEIGHT
- Disable with ENABLE_HYBRID_SEARCH=false to use vector-only search$3
1. File-Level - Find relevant files via summary embeddings + full-text search
2. Chunk-Level - Locate specific code chunks (functions/classes)
3. Symbol Resolution - Resolve imported symbols and dependencies
4. Import Expansion - Build dependency graph (max 3 levels)
5. Deduplication - Remove redundant code from results
$3
1. File discovery (respects .gitignore)
2. Tree-sitter parsing (with regex fallback)
3. Semantic chunking (functions, classes, blocks)
4. LLM-based file summaries (configurable model)
5. Embedding generation (configurable model)
6. Full-text search vector generation (tsvector)
7. PostgreSQL + pgvector storage
Performance Characteristics
$3
- Indexing: 300-600 files/min
- Query Time: <800ms
- Relevance: >92% in top 10 results
- Context Noise: <2%
$3
- Indexing: 500-1000 files/min
- Query Time: <500ms
- Relevance: >85% in top 10 results
System Requirements
- Node.js 22+ (for MCP server)
- PostgreSQL 16+ with pgvector extension
- Ollama with models installed
- Disk Space: ~1GB per 100k LoC indexed
- RAM: 8GB minimum (16GB+ recommended for large codebases)
- GPU: Optional but recommended (RTX 3060+ for qwen2.5-coder:7b)
Troubleshooting
$3
Update
EMBEDDING_DIMENSIONS in MCP config to match your model, then update vector dimensions in
database.sql.$3
Check
POSTGRES_HOST and POSTGRES_PORT in MCP config. Verify PostgreSQL is running:`bash
sudo systemctl status postgresql # Linux
brew services list # macOS
`$3
Pull the required models:
`bash
ollama pull bge-m3:567m
ollama pull qwen2.5-coder:7b
`Verify models are available:
`bash
ollama list
`$3
- Use smaller summary model:
qwen2.5-coder:1.5b instead of 7b
- Reduce HNSW_EF_CONSTRUCTION to 64
- Enable incremental indexing (default)$3
- Increase
HNSW_EF_SEARCH to 300-400
- Raise SIMILARITY_THRESHOLD to 0.4-0.5 for stricter file matching
- Raise CHUNK_SIMILARITY_THRESHOLD to 0.3-0.4 for stricter chunk matching
- Use better summary model: qwen2.5-coder:3b or 7b
- Lower DEDUP_THRESHOLD to 0.90-0.92Documentation
See docs/overview.md for detailed documentation including:
- Complete architecture details
- Database schema
- Configuration reference
- Implementation guide
- Performance tuning
Development
`bash
git clone https://github.com/gianged/cindex.git
cd cindex
npm install
npm run build
npm test
``- Phase 1 (100%) - Database schema & type system
- Phase 2 (100%) - File discovery, parsing, chunking, workspace/service detection
- Phase 3 (100%) - Embeddings, summaries, API parsing, 12-language support, Docker/serverless/mobile
detection
- Phase 4 (100%) - Multi-stage retrieval pipeline (9-stage)
- Phase 5 (100%) - MCP tools (17 of 17 implemented)
- Phase 6 (100%) - Incremental indexing, optimization, testing
Overall: 100% complete
MIT
gianged - Yup, it's me
Contributions welcome! Please open an issue or PR on GitHub.
Built with:
- Model Context Protocol by Anthropic
- pgvector for vector search
- Ollama for local LLM inference
- tree-sitter for code parsing