cindex

Semantic code search and context retrieval for large codebases

A Model Context Protocol (MCP) server that provides intelligent code search and context retrieval
for Claude Code. Handles 1M+ lines of code with accuracy-first design.

Features

- Semantic Search - Vector embeddings for intelligent code discovery
- Hybrid Search - Combines vector similarity with PostgreSQL full-text search for better natural
language query handling
- 9-Stage Retrieval Pipeline - Scope filtering → query → files → chunks → symbols → imports →
APIs → dedup → assembly
- Multi-Project Support - Monorepo, microservices, and reference repository indexing
- Scope Filtering - Global, repository, service, and boundary-aware search modes
- API Contract Search - Semantic search for REST/GraphQL/gRPC endpoints
- Query Caching - LRU cache with 80%+ hit rate (cached queries ~50ms)
- Progress Notifications - Real-time 9-stage pipeline tracking
- Incremental Indexing - Only re-index changed files
- Import Chain Analysis - Automatic dependency resolution
- Deduplication - Remove duplicate utility functions
- Large Codebase Support - Efficiently handles 1M+ LoC
- Claude Code Integration - Native MCP server with 17 tools
- Accuracy-First - Default settings optimized for relevance
- Configurable Models - Swap embedding/LLM models via env vars

Performance

- Indexing Speed: 300-600 files/min (with LLM summaries)
- Query Speed: First query ~800ms, cached queries ~50ms
- Cache Hit Rate: 80%+ for repeated queries
- Codebase Scale: Efficiently handles 1M+ lines of code
- Memory Efficient: LRU caching with configurable limits
- Real-Time Progress: 9-stage pipeline notifications

Supported Languages

12 languages with full tree-sitter parsing: TypeScript, JavaScript, Python, Java, Go, Rust, C,
C++, C#, PHP, Ruby, Kotlin. Swift and other languages use regex fallback parsing.

Prerequisites

Before installing cindex, you need:

$3

PostgreSQL 16+ with pgvector extension for vector similarity search:

``bash

`Ubuntu/Debian`


sudo apt install postgresql-16 postgresql-16-pgvector
macOS

brew install postgresql@16 pgvector
Start PostgreSQL

sudo systemctl start postgresql  # Linux
brew services start postgresql@16  # macOS


$3
Ollama for local LLM inference with two models:
Embedding Model (for vector generation):

`bash

`Install Ollama`


curl https://ollama.ai/install.sh | sh
Pull embedding model (bge-m3:567m recommended)

ollama pull bge-m3:567m


Coding Model (for file summaries and analysis):

`bash

`Pull coding model (qwen2.5-coder:7b recommended)`


ollama pull qwen2.5-coder:7b
Alternative for faster indexing (lower quality):

ollama pull qwen2.5-coder:1.5b


Model Options:
- Embedding: bge-m3:567m (1024 dims, 8K context) - Best accuracy
- Summary: qwen2.5-coder:7b (32K context) - High quality, RTX 4060+ recommended
- Summary: qwen2.5-coder:3b (32K context) - Balanced
- Summary: qwen2.5-coder:1.5b (32K context) - Fast indexing, lower quality
Installation
$3
Create and initialize the cindex database:

`bash

`Create database`


createdb cindex_rag_codebase
Initialize schema (after installing cindex - see next section)


$3
Add cindex to Claude Code using the CLI. You can install for personal use (user scope) or share with
your team (project scope).
#### Quick Install (Personal Use)
Install for all your projects:

`bash claude mcp add cindex --scope user --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ -- npx -y @gianged/cindex`

#### Team Install (Shared via Git)

Install for the current project (creates .mcp.json in project root):

`bash claude mcp add cindex --scope project --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ -- npx -y @gianged/cindex`

Note: For project scope, set POSTGRES_PASSWORDas an environment variable on your system and reference it in the command. Never commit actual secrets to version control.

#### Custom Configuration

Add additional environment variables using multiple --env flags:

`bash claude mcp add cindex --scope user --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ --env POSTGRES_HOST="localhost" \ --env POSTGRES_DB="cindex_rag_codebase" \ --env EMBEDDING_MODEL="bge-m3:567m" \ --env SUMMARY_MODEL="qwen2.5-coder:7b" \ -- npx -y @gianged/cindex`

See Environment Variables section below for all available configuration options.

#### Manual Configuration (Alternative)

If you prefer to manually edit configuration files, you can add cindex to:

User Scope (~/.claude.json):

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "POSTGRES_PASSWORD": "your_password" } } } }`

Project Scope (.mcp.json in project root):

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "POSTGRES_HOST": "${POSTGRES_HOST:-localhost}", "POSTGRES_PORT": "${POSTGRES_PORT:-5432}", "POSTGRES_DB": "${POSTGRES_DB:-cindex_rag_codebase}", "POSTGRES_USER": "${POSTGRES_USER:-postgres}", "POSTGRES_PASSWORD": "${POSTGRES_PASSWORD}" } } } }`

`$3`

After configuring MCP, initialize the database schema:

`bash

`Download schema file`


curl -o database.sql https://raw.githubusercontent.com/gianged/cindex/main/database.sql
Apply schema

psql cindex_rag_codebase < database.sql

$3

1. Open Claude Code 2. Use theindex_repositorytool to index your codebase 3. Usesearch_codebase to find relevant code

`Environment Variables`

All configuration is done through environment variables in your MCP config file.

`$3`

| Variable | Default | Range | Description | | -------------------------- | ------------------------ | ----------- | -------------------------------------------- | |EMBEDDING_MODEL | bge-m3:567m| - | Ollama embedding model for vector generation | |EMBEDDING_DIMENSIONS | 1024| 1-4096 | Vector dimensions (must match model output) | |EMBEDDING_CONTEXT_WINDOW | 4096| 512-131072 | Token limit for embedding model | |SUMMARY_MODEL | qwen2.5-coder:7b| - | Ollama model for file summaries | |SUMMARY_CONTEXT_WINDOW | 4096| 512-131072 | Token limit for summary model | |OLLAMA_HOST | http://localhost:11434| - | Ollama API endpoint | |OLLAMA_TIMEOUT | 30000 | 1000-300000 | Request timeout in milliseconds |

Context Window Notes:

- Default 4096 matches Ollama's default and is sufficient (cindex uses first 100 lines per file) - Higher values = more VRAM usage + slower inference - qwen2.5-coder:7b supports up to 32K tokens - bge-m3:567m supports up to 8K tokens - Increase only if you encounter issues with large files

`$3`

| Variable | Default | Range | Description | | -------------------------- | --------------------- | ------- | ------------------------------- | |POSTGRES_HOST | localhost| - | PostgreSQL server hostname | |POSTGRES_PORT | 5432| 1-65535 | PostgreSQL server port | |POSTGRES_DB | cindex_rag_codebase| - | Database name | |POSTGRES_USER | postgres| - | Database user | |POSTGRES_PASSWORD| _required_ | - | Database password (must be set) | |POSTGRES_MAX_CONNECTIONS | 10 | 1-100 | Maximum connection pool size |

`$3`

| Variable | Default | Range | Description | | ---------------------------- | ------- | ------- | ---------------------------------------------------- | |HNSW_EF_SEARCH | 300| 10-1000 | HNSW search quality (higher = more accurate, slower) | |HNSW_EF_CONSTRUCTION | 200| 10-1000 | HNSW index quality (higher = better index) | |SIMILARITY_THRESHOLD | 0.3| 0.0-1.0 | Minimum similarity for file-level retrieval | |CHUNK_SIMILARITY_THRESHOLD | 0.2| 0.0-1.0 | Minimum similarity for chunk-level retrieval | |DEDUP_THRESHOLD | 0.92| 0.0-1.0 | Similarity threshold for deduplication | |HYBRID_VECTOR_WEIGHT | 0.7| 0.0-1.0 | Weight for vector similarity in hybrid search | |HYBRID_KEYWORD_WEIGHT | 0.3| 0.0-1.0 | Weight for keyword (BM25) score in hybrid search | |IMPORT_DEPTH | 3| 1-10 | Maximum import chain traversal depth | |WORKSPACE_DEPTH | 2| 1-10 | Maximum workspace dependency depth | |SERVICE_DEPTH | 1 | 1-10 | Maximum service dependency depth |

`$3`

| Variable | Default | Range | Description | | ------------------ | ------- | ---------- | ---------------------------------- | |MAX_FILE_SIZE | 5000| 100-100000 | Maximum file size in lines | |INCLUDE_MARKDOWN | false | true/false | Include markdown files in indexing |

`$3`

| Variable | Default | Range | Description | | ------------------------------- | ------- | ---------- | --------------------------------------- | |ENABLE_WORKSPACE_DETECTION | true| true/false | Detect monorepo workspaces | |ENABLE_SERVICE_DETECTION | true| true/false | Detect microservices | |ENABLE_MULTI_REPO | false| true/false | Enable multi-repository support | |ENABLE_API_ENDPOINT_DETECTION | true| true/false | Parse API contracts (REST/GraphQL/gRPC) | |ENABLE_HYBRID_SEARCH | true | true/false | Combine vector + full-text search |

`Example Configurations`

`$3`

Only the required password:

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "POSTGRES_PASSWORD": "your_password" } } } }`

`$3`

All available settings with defaults shown:

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "EMBEDDING_MODEL": "bge-m3:567m", "EMBEDDING_DIMENSIONS": "1024", "EMBEDDING_CONTEXT_WINDOW": "4096", "SUMMARY_MODEL": "qwen2.5-coder:7b", "SUMMARY_CONTEXT_WINDOW": "4096", "OLLAMA_HOST": "http://localhost:11434", "POSTGRES_HOST": "localhost", "POSTGRES_PORT": "5432", "POSTGRES_DB": "cindex_rag_codebase", "POSTGRES_USER": "postgres", "POSTGRES_PASSWORD": "your_password", "HNSW_EF_SEARCH": "300", "HNSW_EF_CONSTRUCTION": "200", "SIMILARITY_THRESHOLD": "0.3", "CHUNK_SIMILARITY_THRESHOLD": "0.2", "DEDUP_THRESHOLD": "0.92" } } } }`

`$3`

For faster indexing with lower quality:

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "POSTGRES_PASSWORD": "your_password", "SUMMARY_MODEL": "qwen2.5-coder:1.5b", "SUMMARY_CONTEXT_WINDOW": "4096", "HNSW_EF_SEARCH": "100", "HNSW_EF_CONSTRUCTION": "64", "SIMILARITY_THRESHOLD": "0.4", "CHUNK_SIMILARITY_THRESHOLD": "0.25", "DEDUP_THRESHOLD": "0.95" } } } }`

Performance:

- Indexing: 500-1000 files/min (vs 300-600 files/min default) - Query Time: <500ms (vs <800ms default) - Relevance: >85% in top 10 results (vs >92% default)

`Recommended Settings`

`$3`

| Setting | Value | Notes | | ---------------------------- | ------------------ | ---------------------------------- | |EMBEDDING_MODEL | bge-m3:567m| Best accuracy/speed balance | |SUMMARY_MODEL | qwen2.5-coder:7b| Good summaries, fits in VRAM | |EMBEDDING_CONTEXT_WINDOW | 4096| Default, sufficient for most files | |HNSW_EF_SEARCH | 300| High accuracy retrieval | |SIMILARITY_THRESHOLD | 0.3| File-level retrieval threshold | |CHUNK_SIMILARITY_THRESHOLD | 0.2| Chunk-level retrieval threshold | |DEDUP_THRESHOLD | 0.92 | Prevent duplicate results |

`$3`

- Indexing: ~30 files/min (~70 chunks/min) - Search: <1 second per query - Codebase: Tested with 40k LoC (112 files)

`Managing Configuration`

`$3`

List all installed MCP servers:

`bash claude mcp list`

View cindex configuration:

`bash claude mcp get cindex`

`$3`

To update environment variables, remove and re-add with new settings:

`bash claude mcp remove cindex claude mcp add cindex --scope user --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ --env SUMMARY_MODEL="qwen2.5-coder:3b" \ -- npx -y @gianged/cindex`

`$3`

For faster indexing with lower quality, use these settings:

`bash claude mcp remove cindex claude mcp add cindex --scope user --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ --env SUMMARY_MODEL="qwen2.5-coder:1.5b" \ --env HNSW_EF_SEARCH="100" \ --env HNSW_EF_CONSTRUCTION="64" \ --env SIMILARITY_THRESHOLD="0.4" \ --env CHUNK_SIMILARITY_THRESHOLD="0.25" \ --env DEDUP_THRESHOLD="0.95" \ -- npx -y @gianged/cindex`

Performance:

- Indexing: 500-1000 files/min (vs 300-600 files/min default) - Query Time: <500ms (vs <800ms default) - Relevance: >85% in top 10 results (vs >92% default)

`$3`

`bash claude mcp remove cindex`

`MCP Tools`

Status: 17 of 17 tools implemented

All tools provide structured output with syntax highlighting and comprehensive metadata.

`$3`

#### search_codebase

Semantic code search with multi-stage retrieval and dependency analysis.

Parameters:

- query(required) - Natural language search query -scope - Search scope: 'global', 'repository', 'service', or 'workspace'-repo_id- Filter by repository ID -service_id- Filter by service ID -workspace_id- Filter by workspace ID -max_results- Maximum results (1-100, default: 20) -similarity_threshold- Minimum similarity (0.0-1.0, default: 0.75) -include_dependencies - Include imported dependencies (default: false)

Returns: Markdown-formatted results with file paths, line numbers, code snippets, and relevance scores.

#### get_file_context

Get complete context for a specific file including callers, callees, and import chain.

Parameters:

- file_path(required) - Absolute or relative file path -repo_id- Repository ID (optional if file path is unique) -include_callers- Include functions that call this file (default: true) -include_callees- Include functions called by this file (default: true) -include_imports- Include import chain (default: true) -max_depth - Import chain depth (1-5, default: 2)

Returns: File summary, symbols, dependencies, and related code context.

#### find_symbol_definition

Locate symbol definitions and optionally show usages across the codebase.

Parameters:

- symbol_name(required) - Function, class, or variable name -repo_id- Filter by repository ID -file_path- Filter by file path -symbol_type - Filter by type: 'function', 'class', 'variable', 'interface', etc. -include_usages- Show where symbol is used (default: false) -max_usages - Maximum usage results (1-100, default: 50)

Returns: Symbol definitions with file paths, line numbers, signatures, and optional usage locations.

`$3`

#### index_repository

Index or re-index a repository with progress notifications and multi-project support.

Parameters:

- repo_path(required) - Absolute path to repository root -repo_id- Repository identifier (default: directory name) -repo_type - Repository type: 'monolithic', 'microservice', 'monorepo', 'library','reference', or 'documentation'-force_reindex- Force full re-index (default: false, uses incremental indexing) -detect_workspaces- Detect monorepo workspaces (default: true) -detect_services- Detect microservices (default: true) -detect_api_endpoints- Parse API contracts (default: true) -service_config- Manual service configuration (optional) -version - Repository version for reference repos (e.g., 'v10.3.0') -metadata - Additional metadata (e.g., { upstream_url: '...' })

Returns: Indexing statistics including files indexed, chunks created, symbols extracted, workspaces/services detected, and timing information.

#### delete_repository

Delete one or more indexed repositories and all associated data.

Parameters:

- repo_ids (required) - Array of repository IDs to delete

Returns: Deletion confirmation with statistics (files, chunks, symbols, workspaces, services removed).

#### list_indexed_repos

List all indexed repositories with optional metadata, workspace counts, and service counts.

Parameters:

- include_metadata- Include repository metadata (default: true) -include_workspace_count- Include workspace count for monorepos (default: true) -include_service_count- Include service count for microservices (default: true) -repo_type_filter - Filter by repository type

Returns: List of repositories with IDs, types, file counts, last indexed time, and optional metadata.

`$3`

#### list_workspaces

List all workspaces in indexed repositories for monorepo support.

Parameters:

- repo_id- Filter by repository ID (optional) -include_dependencies- Include dependency information (default: false) -include_metadata - Include package.json metadata (default: false)

Returns: List of workspaces with package names, paths, file counts, and optional dependencies.

#### get_workspace_context

Get full context for a workspace including dependencies and dependents.

Parameters:

- workspace_id - Workspace ID (use list_workspacesto find) -package_name- Package name (alternative to workspace_id) -repo_id- Repository ID (required if using package_name) -include_dependencies- Include workspace dependencies (default: true) -include_dependents- Include workspaces that depend on this one (default: true) -dependency_depth - Dependency tree depth (1-5, default: 2)

Returns: Workspace metadata, dependency tree, dependent workspaces, and file list.

#### find_cross_workspace_usages

Find workspace package usages across the monorepo.

Parameters:

- workspace_id- Source workspace ID -package_name- Source package name (alternative to workspace_id) -symbol_name- Specific symbol to track (optional) -include_indirect- Include indirect usages (default: false) -max_depth - Dependency chain depth (1-5, default: 2)

Returns: List of workspaces using the target package/symbol with file locations.

`$3`

#### list_services

List all services across indexed repositories for microservice support.

Parameters:

- repo_id- Filter by repository ID (optional) -service_type - Filter by type: 'docker', 'serverless', 'mobile'(optional) -include_dependencies- Include service dependencies (default: false) -include_api_endpoints - Include API endpoint counts (default: false)

Returns: List of services with IDs, names, types, file counts, and optional API information.

#### get_service_context

Get full context for a service including API contracts and dependencies.

Parameters:

- service_id - Service ID (use list_servicesto find) -service_name- Service name (alternative to service_id) -repo_id- Repository ID (required if using service_name) -include_dependencies- Include service dependencies (default: true) -include_dependents- Include services that depend on this one (default: true) -include_api_contracts- Include API endpoint definitions (default: true) -dependency_depth - Dependency tree depth (1-5, default: 1)

Returns: Service metadata, API contracts (REST/GraphQL/gRPC), dependency graph, and file list.

#### find_cross_service_calls

Find inter-service API calls across microservices.

Parameters:

- source_service_id- Source service ID (optional) -target_service_id- Target service ID (optional) -endpoint_pattern - Endpoint regex pattern (e.g., /api/users/.*, optional) -include_reverse - Also show calls in reverse direction (default: false)

Returns: List of inter-service API calls with endpoints, HTTP methods, and call counts.

`$3`

#### search_api_contracts

Search API endpoints across services with semantic understanding.

Parameters:

- query(required) - API search query (e.g., "user authentication endpoint") -api_types - Filter by type: ['rest', 'graphql', 'grpc'](default: all) -service_filter- Filter by service IDs (optional) -repo_filter- Filter by repository IDs (optional) -include_deprecated- Include deprecated endpoints (default: false) -max_results- Maximum results (1-100, default: 20) -similarity_threshold - Minimum similarity (0.0-1.0, default: 0.70)

Returns: API endpoints with paths, HTTP methods, service names, implementation files, and similarity scores.

`$3`

Tools for searching reference materials including markdown documentation (syntax references, Context7-fetched docs) AND reference repository code (indexed frameworks/libraries).

#### index_documentation

Index markdown files for documentation search. Works with explicit paths only.

Parameters:

- paths(required) - Array of file or directory paths to index (e.g.,['syntax.md', '/docs/libraries/']) -doc_id- Document identifier (default: derived from path) -tags - Tags for filtering (e.g., ['typescript', 'react']) -force_reindex - Force re-index even if unchanged (default: false)

Returns: Indexing statistics including files indexed, sections created, code blocks extracted, and timing.

Workflow:

1. Fetch documentation (e.g., from Context7) 2. Save to markdown file 3. Index withindex_documentation4. Search withsearch_references

#### search_references

Search reference materials including markdown documentation AND reference repository code. Combines both sources for comprehensive reference search.

Parameters:

- query(required) - Natural language search query -doc_ids- Filter by document IDs (optional) -tags- Filter by documentation tags (optional) -include_docs- Include markdown documentation results (default: true) -include_code- Include reference repository code results (default: true) -max_results- Maximum results per source (1-50, default: 10) -include_code_blocks- Include code blocks from documentation (default: true) -similarity_threshold - Minimum similarity (0.0-1.0, default: 0.65)

Returns: Combined results from both documentation chunks and reference repository code, with heading breadcrumbs, content snippets, code blocks, file paths, and relevance scores.

Note: Reference repositories are indexed using index_repository with repo_type: 'reference'. They are excluded fromsearch_codebase by default and only searchable via search_references.

#### list_documentation

List all indexed documentation with metadata.

Parameters:

- doc_ids- Filter by document IDs (optional) -tags - Filter by tags (optional)

Returns: List of indexed documents with file counts, section counts, code block counts, and indexed timestamps.

#### delete_documentation

Delete indexed documentation by document ID.

Parameters:

- doc_ids (required) - Array of document IDs to delete

Returns: Deletion confirmation with chunks and files removed.

---

See docs/overview.md for complete tool documentation including multi-project/monorepo/microservice architecture details.

`Architecture`

`$3`

Combines vector similarity search with PostgreSQL full-text search (tsvector/ts_rank_cd) for improved natural language query handling:

`hybrid_score = (0.7 vector_similarity) + (0.3 keyword_score)`

- Vector search - Semantic understanding via embeddings - Keyword search - Exact term matching via PostgreSQL full-text search - Configurable weights viaHYBRID_VECTOR_WEIGHT and HYBRID_KEYWORD_WEIGHT- Disable withENABLE_HYBRID_SEARCH=false to use vector-only search

`$3`

1. File-Level - Find relevant files via summary embeddings + full-text search 2. Chunk-Level - Locate specific code chunks (functions/classes) 3. Symbol Resolution - Resolve imported symbols and dependencies 4. Import Expansion - Build dependency graph (max 3 levels) 5. Deduplication - Remove redundant code from results

`$3`

1. File discovery (respects .gitignore) 2. Tree-sitter parsing (with regex fallback) 3. Semantic chunking (functions, classes, blocks) 4. LLM-based file summaries (configurable model) 5. Embedding generation (configurable model) 6. Full-text search vector generation (tsvector) 7. PostgreSQL + pgvector storage

`Performance Characteristics`

`$3`

- Indexing: 300-600 files/min - Query Time: <800ms - Relevance: >92% in top 10 results - Context Noise: <2%

`$3`

- Indexing: 500-1000 files/min - Query Time: <500ms - Relevance: >85% in top 10 results

`System Requirements`

- Node.js 22+ (for MCP server) - PostgreSQL 16+ with pgvector extension - Ollama with models installed - Disk Space: ~1GB per 100k LoC indexed - RAM: 8GB minimum (16GB+ recommended for large codebases) - GPU: Optional but recommended (RTX 3060+ for qwen2.5-coder:7b)

`Troubleshooting`

`$3`

Update EMBEDDING_DIMENSIONSin MCP config to match your model, then update vector dimensions indatabase.sql.

`$3`

Check POSTGRES_HOST and POSTGRES_PORT in MCP config. Verify PostgreSQL is running:

`bash sudo systemctl status postgresql # Linux brew services list # macOS`

`$3`

Pull the required models:

`bash ollama pull bge-m3:567m ollama pull qwen2.5-coder:7b`

Verify models are available:

`bash ollama list`

`$3`

- Use smaller summary model: qwen2.5-coder:1.5b instead of 7b- ReduceHNSW_EF_CONSTRUCTION to 64- Enable incremental indexing (default)

`$3`

- Increase HNSW_EF_SEARCH to 300-400- RaiseSIMILARITY_THRESHOLD to 0.4-0.5for stricter file matching - RaiseCHUNK_SIMILARITY_THRESHOLD to 0.3-0.4for stricter chunk matching - Use better summary model:qwen2.5-coder:3b or 7b- LowerDEDUP_THRESHOLD to 0.90-0.92

`Documentation`

See docs/overview.md for detailed documentation including:

- Complete architecture details - Database schema - Configuration reference - Implementation guide - Performance tuning

`Development`

`bash git clone https://github.com/gianged/cindex.git cd cindex npm install npm run build npm test``

Implementation Status

- Phase 1 (100%) - Database schema & type system
- Phase 2 (100%) - File discovery, parsing, chunking, workspace/service detection
- Phase 3 (100%) - Embeddings, summaries, API parsing, 12-language support, Docker/serverless/mobile
detection
- Phase 4 (100%) - Multi-stage retrieval pipeline (9-stage)
- Phase 5 (100%) - MCP tools (17 of 17 implemented)
- Phase 6 (100%) - Incremental indexing, optimization, testing

Overall: 100% complete

License

MIT

Author

gianged - Yup, it's me

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Acknowledgments

Built with:

- Model Context Protocol by Anthropic
- pgvector for vector search
- Ollama for local LLM inference
- tree-sitter for code parsing

cindex

Semantic code search and context retrieval for large codebases

A Model Context Protocol (MCP) server that provides intelligent code search and context retrieval
for Claude Code. Handles 1M+ lines of code with accuracy-first design.

Features

Performance

Supported Languages

12 languages with full tree-sitter parsing: TypeScript, JavaScript, Python, Java, Go, Rust, C,
C++, C#, PHP, Ruby, Kotlin. Swift and other languages use regex fallback parsing.

Prerequisites

Before installing cindex, you need:

$3

PostgreSQL 16+ with pgvector extension for vector similarity search:

``bash

`Ubuntu/Debian`


sudo apt install postgresql-16 postgresql-16-pgvector
macOS

brew install postgresql@16 pgvector
Start PostgreSQL

sudo systemctl start postgresql  # Linux
brew services start postgresql@16  # macOS


$3
Ollama for local LLM inference with two models:
Embedding Model (for vector generation):

`bash

`Install Ollama`


curl https://ollama.ai/install.sh | sh
Pull embedding model (bge-m3:567m recommended)

ollama pull bge-m3:567m


Coding Model (for file summaries and analysis):

`bash

`Pull coding model (qwen2.5-coder:7b recommended)`


ollama pull qwen2.5-coder:7b
Alternative for faster indexing (lower quality):

ollama pull qwen2.5-coder:1.5b


Model Options:
- Embedding: bge-m3:567m (1024 dims, 8K context) - Best accuracy
- Summary: qwen2.5-coder:7b (32K context) - High quality, RTX 4060+ recommended
- Summary: qwen2.5-coder:3b (32K context) - Balanced
- Summary: qwen2.5-coder:1.5b (32K context) - Fast indexing, lower quality
Installation
$3
Create and initialize the cindex database:

`bash

`Create database`


createdb cindex_rag_codebase
Initialize schema (after installing cindex - see next section)


$3
Add cindex to Claude Code using the CLI. You can install for personal use (user scope) or share with
your team (project scope).
#### Quick Install (Personal Use)
Install for all your projects:

`bash claude mcp add cindex --scope user --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ -- npx -y @gianged/cindex`

#### Team Install (Shared via Git)

Install for the current project (creates .mcp.json in project root):

`bash claude mcp add cindex --scope project --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ -- npx -y @gianged/cindex`

Note: For project scope, set POSTGRES_PASSWORDas an environment variable on your system and reference it in the command. Never commit actual secrets to version control.

#### Custom Configuration

Add additional environment variables using multiple --env flags:

See Environment Variables section below for all available configuration options.

#### Manual Configuration (Alternative)

If you prefer to manually edit configuration files, you can add cindex to:

User Scope (~/.claude.json):

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "POSTGRES_PASSWORD": "your_password" } } } }`

Project Scope (.mcp.json in project root):

`$3`

After configuring MCP, initialize the database schema:

`bash

`Download schema file`


curl -o database.sql https://raw.githubusercontent.com/gianged/cindex/main/database.sql
Apply schema

psql cindex_rag_codebase < database.sql

$3

1. Open Claude Code 2. Use theindex_repositorytool to index your codebase 3. Usesearch_codebase to find relevant code

`Environment Variables`

All configuration is done through environment variables in your MCP config file.

`$3`

Context Window Notes:

`$3`

`Example Configurations`

`$3`

Only the required password:

`json { "mcpServers": { "cindex": { "type": "stdio", "command": "npx", "args": ["-y", "@gianged/cindex"], "env": { "POSTGRES_PASSWORD": "your_password" } } } }`

`$3`

All available settings with defaults shown:

`$3`

For faster indexing with lower quality:

Performance:

- Indexing: 500-1000 files/min (vs 300-600 files/min default) - Query Time: <500ms (vs <800ms default) - Relevance: >85% in top 10 results (vs >92% default)

`Recommended Settings`

`$3`

- Indexing: ~30 files/min (~70 chunks/min) - Search: <1 second per query - Codebase: Tested with 40k LoC (112 files)

`Managing Configuration`

`$3`

List all installed MCP servers:

`bash claude mcp list`

View cindex configuration:

`bash claude mcp get cindex`

`$3`

To update environment variables, remove and re-add with new settings:

`bash claude mcp remove cindex claude mcp add cindex --scope user --transport stdio \ --env POSTGRES_PASSWORD="your_password" \ --env SUMMARY_MODEL="qwen2.5-coder:3b" \ -- npx -y @gianged/cindex`

`$3`

For faster indexing with lower quality, use these settings:

Performance:

- Indexing: 500-1000 files/min (vs 300-600 files/min default) - Query Time: <500ms (vs <800ms default) - Relevance: >85% in top 10 results (vs >92% default)

`$3`

`bash claude mcp remove cindex`

`MCP Tools`

Status: 17 of 17 tools implemented

All tools provide structured output with syntax highlighting and comprehensive metadata.

`$3`

#### search_codebase

Semantic code search with multi-stage retrieval and dependency analysis.

Parameters:

Returns: Markdown-formatted results with file paths, line numbers, code snippets, and relevance scores.

#### get_file_context

Get complete context for a specific file including callers, callees, and import chain.

Parameters:

Returns: File summary, symbols, dependencies, and related code context.

#### find_symbol_definition

Locate symbol definitions and optionally show usages across the codebase.

Parameters:

Returns: Symbol definitions with file paths, line numbers, signatures, and optional usage locations.

`$3`

#### index_repository

Index or re-index a repository with progress notifications and multi-project support.

Parameters:

Returns: Indexing statistics including files indexed, chunks created, symbols extracted, workspaces/services detected, and timing information.

#### delete_repository

Delete one or more indexed repositories and all associated data.

Parameters:

- repo_ids (required) - Array of repository IDs to delete

Returns: Deletion confirmation with statistics (files, chunks, symbols, workspaces, services removed).

#### list_indexed_repos

List all indexed repositories with optional metadata, workspace counts, and service counts.

Parameters:

Returns: List of repositories with IDs, types, file counts, last indexed time, and optional metadata.

`$3`

#### list_workspaces

List all workspaces in indexed repositories for monorepo support.

Parameters:

- repo_id- Filter by repository ID (optional) -include_dependencies- Include dependency information (default: false) -include_metadata - Include package.json metadata (default: false)

Returns: List of workspaces with package names, paths, file counts, and optional dependencies.

#### get_workspace_context

Get full context for a workspace including dependencies and dependents.

Parameters:

Returns: Workspace metadata, dependency tree, dependent workspaces, and file list.

#### find_cross_workspace_usages

Find workspace package usages across the monorepo.

Parameters:

Returns: List of workspaces using the target package/symbol with file locations.

`$3`

#### list_services

List all services across indexed repositories for microservice support.

Parameters:

Returns: List of services with IDs, names, types, file counts, and optional API information.

#### get_service_context

Get full context for a service including API contracts and dependencies.

Parameters:

Returns: Service metadata, API contracts (REST/GraphQL/gRPC), dependency graph, and file list.

#### find_cross_service_calls

Find inter-service API calls across microservices.

Parameters:

Returns: List of inter-service API calls with endpoints, HTTP methods, and call counts.

`$3`

#### search_api_contracts

Search API endpoints across services with semantic understanding.

Parameters:

Returns: API endpoints with paths, HTTP methods, service names, implementation files, and similarity scores.

`$3`

Tools for searching reference materials including markdown documentation (syntax references, Context7-fetched docs) AND reference repository code (indexed frameworks/libraries).

#### index_documentation

Index markdown files for documentation search. Works with explicit paths only.

Parameters:

Returns: Indexing statistics including files indexed, sections created, code blocks extracted, and timing.

Workflow:

1. Fetch documentation (e.g., from Context7) 2. Save to markdown file 3. Index withindex_documentation4. Search withsearch_references

#### search_references

Search reference materials including markdown documentation AND reference repository code. Combines both sources for comprehensive reference search.

Parameters:

Returns: Combined results from both documentation chunks and reference repository code, with heading breadcrumbs, content snippets, code blocks, file paths, and relevance scores.

Note: Reference repositories are indexed using index_repository with repo_type: 'reference'. They are excluded fromsearch_codebase by default and only searchable via search_references.

#### list_documentation

List all indexed documentation with metadata.

Parameters:

- doc_ids- Filter by document IDs (optional) -tags - Filter by tags (optional)

Returns: List of indexed documents with file counts, section counts, code block counts, and indexed timestamps.

#### delete_documentation

Delete indexed documentation by document ID.

Parameters:

- doc_ids (required) - Array of document IDs to delete

Returns: Deletion confirmation with chunks and files removed.

---

See docs/overview.md for complete tool documentation including multi-project/monorepo/microservice architecture details.

`Architecture`

`$3`

Combines vector similarity search with PostgreSQL full-text search (tsvector/ts_rank_cd) for improved natural language query handling:

`hybrid_score = (0.7 vector_similarity) + (0.3 keyword_score)`

`$3`

`Performance Characteristics`

`$3`

- Indexing: 300-600 files/min - Query Time: <800ms - Relevance: >92% in top 10 results - Context Noise: <2%

`$3`

- Indexing: 500-1000 files/min - Query Time: <500ms - Relevance: >85% in top 10 results

`System Requirements`

`Troubleshooting`

`$3`

Update EMBEDDING_DIMENSIONSin MCP config to match your model, then update vector dimensions indatabase.sql.

`$3`

Check POSTGRES_HOST and POSTGRES_PORT in MCP config. Verify PostgreSQL is running:

`bash sudo systemctl status postgresql # Linux brew services list # macOS`

`$3`

Pull the required models:

`bash ollama pull bge-m3:567m ollama pull qwen2.5-coder:7b`

Verify models are available:

`bash ollama list`

`$3`

- Use smaller summary model: qwen2.5-coder:1.5b instead of 7b- ReduceHNSW_EF_CONSTRUCTION to 64- Enable incremental indexing (default)

`$3`

`Documentation`

See docs/overview.md for detailed documentation including:

- Complete architecture details - Database schema - Configuration reference - Implementation guide - Performance tuning

`Development`

`bash git clone https://github.com/gianged/cindex.git cd cindex npm install npm run build npm test``

Implementation Status

Overall: 100% complete

License

MIT

Author

gianged - Yup, it's me

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Acknowledgments

Built with:

- Model Context Protocol by Anthropic
- pgvector for vector search
- Ollama for local LLM inference
- tree-sitter for code parsing