MCP server and CLI tool for local document search with hybrid semantic+keyword search across PDFs, files, and Confluence
npm install docsearch-mcp


A local-first document search and indexing system that provides hybrid semantic + keyword search across local files (including PDFs) and Confluence pages through the Model Context Protocol (MCP). Perfect for AI assistants like Claude Code/Desktop to access your documentation, codebase, and research materials.
- 🔍 Hybrid Search: Combines full-text search (FTS) with vector similarity for optimal results
- 📁 Multi-Source: Index local files (code, docs, PDFs) and Confluence spaces
- 📄 PDF Support: Extract and search text from PDF documents with metadata preservation
- 🖼️ Image Search: AI-powered image description and search for diagrams, screenshots, and charts
- 🗄️ Database Flexibility: Support for SQLite (local-first) and PostgreSQL (scalable)
- 🤖 MCP Integration: Seamless integration with Claude Code and other MCP-compatible tools
- 💻 CLI Tool: Standalone command-line interface with multiple output formats
- ⚡ Real-time Updates: File watching with automatic re-indexing
- 🎯 Smart Chunking: Intelligent text chunking for code, documentation, and PDFs
- 📊 Multiple Output Formats: Text, JSON, and YAML output for search results
- 🔒 Secure: API keys and sensitive data stay on your machine
``bashInstall globally
npm install -g docsearch-mcp
$3
`bash
Pull the image
docker pull ghcr.io/patrickkoss/docsearch-mcp:v0.0.2Run the MCP server with proper permissions
docker run --rm -i \
-v /path/to/your/documents:/app/documents \
-v /path/to/data:/app/data \
--env-file .env \
--user $(id -u):$(id -g) \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
`$3
Add to your MCP client configuration:
`json
{
"mcpServers": {
"docsearch": {
"command": "npx",
"args": ["docsearch-mcp", "start"]
}
}
}
`$3
For development or customization:
`bash
Clone the repository
git clone https://github.com/patrickkoss/docsearch-mcp.git
cd docsearch-mcpQuick setup with Make
make setupOr manual setup
pnpm install
cp .env.example .env
Edit .env with your API keys and configuration
`$3
#### CLI Tool
`bash
Ingest documents
docsearch-mcp ingest files # Index local files
docsearch-mcp ingest confluence # Index Confluence pages
docsearch-mcp ingest all --watch # Index all sources with file watchingSearch documents
docsearch-mcp search "your query" # Basic search
docsearch-mcp search "typescript" -k 5 -o json # JSON output, top 5 results
docsearch-mcp search "API docs" -s confluence # Search only ConfluenceGet help
docsearch-mcp --help
docsearch-mcp search --help
`#### MCP Server
`bash
Run the MCP server directly
npx docsearch-mcp startOr if installed globally
docsearch-mcp start
`#### Docker Usage
`bash
Run with official Docker image (recommended)
docker run --rm -i \
-v /path/to/your/documents:/app/documents \
-v /path/to/data:/app/data \
--env-file .env \
--user $(id -u):$(id -g) \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.2Run CLI commands with Docker
docker run --rm -i \
-v /path/to/your/documents:/app/documents \
-v /path/to/data:/app/data \
--env-file .env \
--user $(id -u):$(id -g) \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.2 \
npx docsearch-mcp search "your query"Alternative: Use Docker volumes (simpler but less control)
docker run --rm \
-v ./documents:/app/documents \
-v docsearch-data:/app/data \
--env-file .env \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.2For development (requires cloning repo)
docker-compose up -d docsearch-mcp
`⚙️ Configuration
$3
When using the official Docker image, create a
.env file:`bash
echo "OPENAI_API_KEY=your-openai-key" > .env
echo "FILE_ROOTS=/app/documents" >> .env
`Key considerations for Docker deployment:
- Document Volume: Mount your documents directory to
/app/documents in the container
- Data Persistence: Mount your data directory to /app/data to persist the SQLite database
- User Permissions: Use --user $(id -u):$(id -g) to avoid permission issues with mounted volumes
- Environment Variables: Pass configuration via .env file or environment variables
- Official Image: Use ghcr.io/patrickkoss/docsearch-mcp:v0.0.2 for production deployments#### Using the Official Docker Image
Recommended approach with proper permissions:
`bash
Basic usage with user permissions (avoids file ownership issues)
docker run --rm -i \
-v /absolute/path/to/documents:/app/documents \
-v /absolute/path/to/data:/app/data \
--env-file .env \
--user $(id -u):$(id -g) \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.5For MCP server integration (background service)
docker run -d --name docsearch-mcp \
-v /absolute/path/to/documents:/app/documents \
-v /absolute/path/to/data:/app/data \
--env-file .env \
--user $(id -u):$(id -g) \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.5
`Alternative approach with Docker volumes:
`bash
Using Docker volumes (simpler but less direct file access)
docker run --rm \
-v ./documents:/app/documents \
-v docsearch-data:/app/data \
--env-file .env \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
`Docker Permissions Explained:
-
--user $(id -u):$(id -g): Runs the container with your user ID and group ID, preventing file permission issues
- Dynamic User Creation: The container automatically creates a user matching your mounted volume permissions
- Automatic File Ownership: Container files are automatically owned by the correct user
- Cross-Platform Compatibility: Works consistently across different host systems$3
Create a
.env file from .env.example and configure:$3
`env
EMBEDDINGS_PROVIDER=openai
OPENAI_API_KEY=your_openai_key
OPENAI_EMBED_MODEL=text-embedding-3-small
`$3
`env
FILE_ROOTS=.
FILE_INCLUDE_GLOBS=*/.{ts,js,py,md,txt,pdf,png,jpg,jpeg,gif,svg,webp}
FILE_EXCLUDE_GLOBS=/node_modules/,/dist/
`$3
`env
ENABLE_IMAGE_TO_TEXT=true
IMAGE_TO_TEXT_PROVIDER=openai
IMAGE_TO_TEXT_MODEL=gpt-4o-mini
`$3
`env
Use SQLite (default, local-first)
DB_TYPE=sqlite
DB_PATH=./data/index.dbOR use PostgreSQL (for scalability)
DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/docsearch
`$3
`env
CONFLUENCE_BASE_URL=https://yourcompany.atlassian.net
CONFLUENCE_EMAIL=your@email.com
CONFLUENCE_API_TOKEN=your_confluence_token
CONFLUENCE_SPACES=SPACE1,SPACE2Advanced filtering (all optional)
Filter by parent page IDs - only index pages under these parent pages
CONFLUENCE_PARENT_PAGES=12345678,87654321Title patterns to include (supports * wildcards)
CONFLUENCE_TITLE_INCLUDES=API,User Guide,Documentation*Title patterns to exclude (supports * wildcards)
CONFLUENCE_TITLE_EXCLUDES=Draft,Archive,Test
`Confluence Filtering Options:
- Parent Page Filtering: Specify page IDs to only index pages within specific folder hierarchies
- Title Include Patterns: Define patterns to only index pages with matching titles
- Title Exclude Patterns: Define patterns to skip pages with matching titles
- Wildcard Support: Use
for flexible pattern matching (e.g., API matches any title starting with "API")$3
The system automatically detects and processes different file types:
- Code files:
.ts, .js, .py, .go, .rs, .java, .cpp, .c, .rb, .php, .kt, .swift
- Documentation: .md, .mdx, .txt, .rst, .adoc, .yaml, .yml, .json
- PDFs: .pdf files are automatically parsed with text extraction and metadata preservation
- Images: .png, .jpg, .jpeg, .gif, .svg, .webp files with optional AI-powered description generationPDF files are processed with:
- Text extraction using advanced parsing
- Metadata preservation (page count, document info)
- Smart text normalization and chunking
- Error handling for corrupted or encrypted files
Image files support optional AI-powered description generation:
- Vision Model Integration: Uses OpenAI vision models (or compatible APIs) to generate detailed descriptions
- Configurable Processing: Enable/disable image-to-text via
ENABLE_IMAGE_TO_TEXT environment variable
- Fallback Support: Images are indexed by filename when vision processing is disabled
- Metadata Preservation: Stores image dimensions, file size, and description metadata
- Search Integration: Image descriptions are indexed as searchable text content📖 Usage
$3
The CLI tool provides a convenient way to ingest and search documents directly from the command line:
#### CLI Commands & Flags
All CLI flags support configuration precedence: CLI args > Environment variables > Config file
##### Global Options (Available for all commands)
| Flag | Environment Variable | Default | Description |
| --------------------------------------- | ---------------------------- | ------------------------------------------------------------- | ---------------------------------------- |
|
-c, --config | - | .env | Configuration file path |
| --embeddings-provider | EMBEDDINGS_PROVIDER | openai | Embeddings provider (openai or tei) |
| --openai-api-key | OPENAI_API_KEY | - | OpenAI API key |
| --openai-base-url | OPENAI_BASE_URL | - | OpenAI base URL (for custom endpoints) |
| --openai-embed-model | OPENAI_EMBED_MODEL | text-embedding-3-small | OpenAI embedding model |
| --openai-embed-dim | OPENAI_EMBED_DIM | 1536 | OpenAI embedding dimension |
| --tei-endpoint | TEI_ENDPOINT | - | Text Embeddings Inference endpoint |
| --enable-image-to-text | ENABLE_IMAGE_TO_TEXT | false | Enable image-to-text processing |
| --image-to-text-provider | IMAGE_TO_TEXT_PROVIDER | openai | Image-to-text provider |
| --image-to-text-model | IMAGE_TO_TEXT_MODEL | gpt-4o-mini | Image-to-text model |
| --confluence-base-url | CONFLUENCE_BASE_URL | - | Confluence base URL |
| --confluence-email | CONFLUENCE_EMAIL | - | Confluence email |
| --confluence-api-token | CONFLUENCE_API_TOKEN | - | Confluence API token |
| --confluence-spaces | CONFLUENCE_SPACES | - | Confluence spaces (comma-separated) |
| --file-roots | FILE_ROOTS | . | File roots to index (comma-separated) |
| --file-include-globs | FILE_INCLUDE_GLOBS | */.{go,ts,tsx,js,py,rs,java,md,mdx,txt,yaml,yml,json,pdf} | File include patterns |
| --file-exclude-globs | FILE_EXCLUDE_GLOBS | /{.git,node_modules,dist,build,target}/ | File exclude patterns |
| --db-type | DB_TYPE | sqlite | Database type (sqlite or postgresql) |
| --db-path | DB_PATH | ./data/index.db | SQLite database path |
| --postgres-connection-string | POSTGRES_CONNECTION_STRING | - | PostgreSQL connection string |##### Ingest Command
`bash
docsearch ingest [source] [options]
`Arguments:
-
source - Source to ingest: files, confluence, or all (default: all)Options:
-
-w, --watch - Watch for file changes and re-indexExamples:
`bash
Basic ingestion with environment config
docsearch ingest files
docsearch ingest confluence
docsearch ingest all --watchOverride config with CLI args
docsearch ingest files --file-roots "./src,./docs" --openai-api-key sk-xxxUse custom config file and database
docsearch --config prod.env --db-path /data/prod.db ingest allIngest with custom file patterns
docsearch ingest files \
--file-include-globs "*/.{ts,md,py}" \
--file-exclude-globs "/node_modules/,/venv/"Use PostgreSQL instead of SQLite
docsearch ingest all \
--db-type postgresql \
--postgres-connection-string "postgresql://user:pass@localhost:5432/docs"
`##### Search Command
`bash
docsearch search [options]
`Arguments:
-
query - Search query (required)Options:
| Flag | Default | Description |
| ---------------------------- | ------- | ------------------------------------------- |
|
-k, --top-k | 10 | Number of results to return (1-100) |
| -s, --source | - | Filter by source: file or confluence |
| -r, --repo | - | Filter by repository name |
| -p, --path-prefix | - | Filter by path prefix |
| -m, --mode | auto | Search mode: auto, vector, or keyword |
| -o, --output | text | Output format: text, json, or yaml |
| --include-images | - | Include images in search results |
| --images-only | - | Search only images |Examples:
`bash
Basic search
docsearch search "typescript interface"Limit results and format as JSON
docsearch search "API documentation" --top-k 5 --output jsonSearch only Confluence with repository filter
docsearch search "deployment guide" --source confluence --repo backend-docsSemantic search only with path filtering
docsearch search "authentication" --mode vector --path-prefix src/auth/Override database and output as YAML
docsearch --db-path /custom/index.db search "error handling" --output yamlComplex search with multiple filters
docsearch search "React hooks" \
--source file \
--repo frontend \
--path-prefix src/components/ \
--mode auto \
--top-k 20 \
--output jsonSearch including images (architecture diagrams, screenshots)
docsearch search "user authentication flow" --include-imagesSearch only images for diagrams and charts
docsearch search "system architecture" --images-only --output json
`#### CLI Configuration
The CLI supports multiple configuration sources in order of precedence:
1. Command-line arguments (highest priority)
2. Custom config file (
--config path/to/.env)
3. .env.local file
4. .env file (lowest priority)#### CLI Output Formats
- Text (default): Human-readable with icons and formatting
- JSON: Structured data with timestamps for integration
- YAML: Clean hierarchical format for configuration or documentation
$3
The MCP (Model Context Protocol) server provides seamless integration with Claude Code and other MCP-compatible tools.
#### MCP Configuration Options
All configuration options are passed via environment variables to the MCP server:
##### Required Configuration
| Environment Variable | Default | Description |
| --------------------- | -------- | ----------------------------------------- |
|
EMBEDDINGS_PROVIDER | openai | Embeddings provider (openai or tei) |
| OPENAI_API_KEY | - | OpenAI API key (required if using OpenAI) |##### Optional Configuration
| Environment Variable | Default | Description |
| ---------------------------- | ------------------------------------------------------------- | ---------------------------------------- |
|
OPENAI_BASE_URL | - | OpenAI base URL (for custom endpoints) |
| OPENAI_EMBED_MODEL | text-embedding-3-small | OpenAI embedding model |
| OPENAI_EMBED_DIM | 1536 | OpenAI embedding dimension |
| TEI_ENDPOINT | - | Text Embeddings Inference endpoint |
| FILE_ROOTS | . | File roots to index (comma-separated) |
| FILE_INCLUDE_GLOBS | */.{go,ts,tsx,js,py,rs,java,md,mdx,txt,yaml,yml,json,pdf} | File include patterns |
| FILE_EXCLUDE_GLOBS | /{.git,node_modules,dist,build,target}/ | File exclude patterns |
| CONFLUENCE_BASE_URL | - | Confluence base URL |
| CONFLUENCE_EMAIL | - | Confluence email |
| CONFLUENCE_API_TOKEN | - | Confluence API token |
| CONFLUENCE_SPACES | - | Confluence spaces (comma-separated) |
| DB_TYPE | sqlite | Database type (sqlite or postgresql) |
| DB_PATH | ./data/index.db | SQLite database path |
| POSTGRES_CONNECTION_STRING | - | PostgreSQL connection string |#### Quick Setup
1. Install the package:
`bash
npm install -g docsearch-mcp
`2. Create your configuration:
`bash
# Create a .env file in your project directory
echo "OPENAI_API_KEY=your-openai-key" > .env
echo "FILE_ROOTS=." >> .env
`3. Add to Claude Code MCP settings:
`json
{
"mcpServers": {
"docsearch": {
"command": "npx",
"args": ["docsearch-mcp", "start"],
"env": {
"OPENAI_API_KEY": "your-openai-key",
"EMBEDDINGS_PROVIDER": "openai",
"FILE_ROOTS": ".,../other-project",
"DB_PATH": "/path/to/your/index.db"
}
}
}
}
`#### Docker Integration
##### Method 1: Direct Docker Run
`json
{
"mcpServers": {
"docsearch": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"-v",
"/path/to/your/documents:/app/documents",
"-v",
"docsearch-data:/app/data",
"--env-file",
".env",
"ghcr.io/patrickkoss/docsearch-mcp:v0.0.2"
]
}
}
}
`##### Method 2: Using Docker Compose
Create a
docker-compose.yml file:`yaml
version: '3.8'
services:
docsearch-mcp:
image: ghcr.io/patrickkoss/docsearch-mcp:v0.0.5
volumes:
- ./documents:/app/documents
- docsearch-data:/app/data
env_file:
- .env
stdin_open: true
volumes:
docsearch-data:
`Then configure MCP:
`json
{
"mcpServers": {
"docsearch": {
"command": "docker",
"args": ["exec", "-i", "docsearch-mcp", "node", "dist/src/server/mcp.js"]
}
}
}
`#### Advanced MCP Configuration Examples
##### Multi-Project Setup
`json
{
"mcpServers": {
"docsearch-work": {
"command": "npx",
"args": ["docsearch-mcp", "start"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"FILE_ROOTS": "/work/projects/frontend,/work/projects/backend",
"FILE_INCLUDE_GLOBS": "*/.{ts,tsx,js,py,md,yaml}",
"DB_PATH": "/work/data/work-index.db",
"CONFLUENCE_BASE_URL": "https://company.atlassian.net",
"CONFLUENCE_SPACES": "DEV,API,DOCS"
}
},
"docsearch-personal": {
"command": "npx",
"args": ["docsearch-mcp", "start"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"FILE_ROOTS": "/home/user/projects,/home/user/documents",
"DB_PATH": "/home/user/.docsearch/personal.db",
"EMBEDDINGS_PROVIDER": "tei",
"TEI_ENDPOINT": "http://localhost:8080/embeddings"
}
}
}
}
`##### PostgreSQL Setup
`json
{
"mcpServers": {
"docsearch": {
"command": "npx",
"args": ["docsearch-mcp", "start"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"DB_TYPE": "postgresql",
"POSTGRES_CONNECTION_STRING": "postgresql://user:pass@localhost:5432/docsearch",
"FILE_ROOTS": ".,/other/projects"
}
}
}
}
`#### MCP Tools
The MCP server provides these tools for Claude Code:
#####
doc-searchSearch indexed documents with optional output formatting.
Parameters:
-
query (string, required): Search query
- topK (number, optional): Number of results to return (default: 10, max: 100)
- source (string, optional): Filter by source (file or confluence)
- repo (string, optional): Filter by repository name
- pathPrefix (string, optional): Filter by path prefix
- mode (string, optional): Search mode (auto, vector, keyword, default: auto)
- outputFormat (string, optional): Output format (text, json, yaml, default: text)
- includeImages (boolean, optional): Include images in search results
- imagesOnly (boolean, optional): Search only imagesExample:
`text
Use the doc-search tool to find "TypeScript interfaces" with JSON output format, limited to 5 results from file sources only.
`#####
doc-ingestIngest documents from files or Confluence.
Parameters:
-
source (string, required): Source to ingest (files, confluence, or all)
- watch (boolean, optional): Watch for file changes and re-index (default: false)Example:
`text
Use the doc-ingest tool to index all local files and Confluence pages.
`#####
doc-ingest-statusGet indexing statistics and status.
No parameters required.
Example:
`text
Use the doc-ingest-status tool to check how many documents are currently indexed.
`#### MCP Resources
The server also provides chunk resources via URIs:
-
docchunk://{chunk_id}: Retrieve full content of a specific document chunk by ID📚 Practical Examples
$3
#### Quick Setup Workflow
`bash
1. Install globally
npm install -g docsearch-mcp2. Configure with your API key
echo "OPENAI_API_KEY=sk-your-key-here" > .env
echo "FILE_ROOTS=.,../other-project" >> .env3. Index your documents
docsearch-mcp ingest all --watch &4. Test search
docsearch-mcp search "how to deploy" --output json --top-k 35. Test MCP server
npx docsearch-mcp start
`#### Production Docker Workflow
`bash
1. Setup production environment
echo "OPENAI_API_KEY=sk-your-key" > .env
echo "FILE_ROOTS=/app/documents" >> .env
Add other configuration as needed
2. Create documents directory
mkdir -p documents
cp -r /path/to/your/docs/* documents/3. Run the official Docker image
docker run -d \
--name docsearch-mcp \
-v $(pwd)/documents:/app/documents \
-v docsearch-data:/app/data \
--env-file .env \
ghcr.io/patrickkoss/docsearch-mcp:v0.0.54. Index documents
docker exec docsearch-mcp npx docsearch-mcp ingest all5. Test search
docker exec docsearch-mcp npx docsearch-mcp search "authentication" -o json
`$3
#### Code Documentation Search
Setup for a TypeScript/React Project:
`bash
CLI configuration
docsearch-mcp ingest files \
--file-roots "./src,./docs,./README.md" \
--file-include-globs "*/.{ts,tsx,md,yaml,json}" \
--file-exclude-globs "/node_modules/,/dist/,/.git/"Search for React patterns
docsearch-mcp search "useEffect hook patterns" --source file --top-k 10
docsearch-mcp search "TypeScript interface" --mode vector --output json
`MCP Configuration:
`json
{
"mcpServers": {
"project-docs": {
"command": "npx",
"args": ["docsearch-mcp", "start"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"FILE_ROOTS": "./src,./docs,./README.md",
"FILE_INCLUDE_GLOBS": "*/.{ts,tsx,md,yaml,json}",
"FILE_EXCLUDE_GLOBS": "/node_modules/,/dist/",
"DB_PATH": "./project-search.db"
}
}
}
}
`#### Multi-Repository Setup
Team Documentation Hub:
`json
{
"mcpServers": {
"team-knowledge": {
"command": "npx",
"args": ["docsearch-mcp", "start"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"FILE_ROOTS": "/team/frontend,/team/backend,/team/mobile,/team/docs",
"CONFLUENCE_BASE_URL": "https://company.atlassian.net",
"CONFLUENCE_SPACES": "TEAM,API,ARCH,DEPLOY",
"CONFLUENCE_EMAIL": "team@company.com",
"CONFLUENCE_API_TOKEN": "your-token",
"DB_TYPE": "postgresql",
"POSTGRES_CONNECTION_STRING": "postgresql://user:pass@localhost/team_docs"
}
}
}
}
`Personal Research Setup:
`bash
Index academic papers and personal notes
docsearch-mcp ingest files \
--file-roots "/home/user/papers,/home/user/notes,/home/user/projects" \
--file-include-globs "*/.{pdf,md,txt,py,ipynb}" \
--embeddings-provider tei \
--tei-endpoint "http://localhost:8080/embeddings"Search across all personal knowledge
docsearch-mcp search "machine learning optimization" --mode auto --top-k 15Search for diagrams and technical images
docsearch-mcp search "neural network architecture" --include-images --top-k 10
docsearch-mcp search "flowchart" --images-only --output json
`#### PDF-Heavy Workflow
`bash
Setup for research document collection
docsearch-mcp ingest files \
--file-roots "/research/papers,/research/reports" \
--file-include-globs "*/.{pdf,docx,md,txt}" \
--openai-embed-model "text-embedding-3-large" \
--openai-embed-dim 3072Search through research papers
docsearch-mcp search "neural network architectures" \
--mode vector \
--top-k 20 \
--output yaml
`$3
#### Complex Filtering
`bash
Search only TypeScript files in components directory
docsearch-mcp search "button component" \
--source file \
--path-prefix src/components/ \
--mode autoFind Confluence deployment docs from specific space
docsearch-mcp search "kubernetes deployment" \
--source confluence \
--repo DEVOPS \
--mode keywordSemantic search across all sources with high precision
docsearch-mcp search "error handling patterns" \
--mode vector \
--top-k 50 \
--output json | jq '.results[] | select(.score > 0.8)'Search for architecture diagrams and flowcharts
docsearch-mcp search "system design diagram" \
--images-only \
--mode vector \
--output jsonInclude images when searching for UI/UX content
docsearch-mcp search "user interface design" \
--include-images \
--path-prefix docs/ \
--top-k 20
`#### Output Format Examples
Text Output (Human-readable):
`bash
docsearch-mcp search "authentication middleware" --output text
`JSON Output (For scripts):
`bash
Get results for further processing
RESULTS=$(docsearch-mcp search "API rate limiting" --output json)
echo "$RESULTS" | jq '.results[].path' | head -5
`YAML Output (For documentation):
`bash
Generate documentation from search results
docsearch-mcp search "configuration options" --output yaml > config-docs.yml
`$3
#### Debug Search Quality
`bash
Compare search modes
docsearch-mcp search "async function" --mode keyword --top-k 5
docsearch-mcp search "async function" --mode vector --top-k 5
docsearch-mcp search "async function" --mode auto --top-k 5Check what's indexed
docsearch-mcp ingest-status
`#### Performance Optimization
`bash
Use smaller embedding model for speed
docsearch-mcp ingest all \
--openai-embed-model "text-embedding-3-small" \
--openai-embed-dim 1536Use local TEI for privacy/speed
docker run -d --name tei -p 8080:80 \
ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 \
--model-id sentence-transformers/all-MiniLM-L6-v2docsearch-mcp ingest all \
--embeddings-provider tei \
--tei-endpoint http://localhost:8080/embeddings
`$3
#### CI/CD Pipeline Integration
`yaml
.github/workflows/docs-index.yml
name: Update Documentation Index
on:
push:
paths: ['docs/', 'src//*.md', 'README.md']jobs:
update-index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 18
- name: Install dependencies
run: npm install -g pnpm && pnpm install
- name: Build docsearch
run: pnpm build
- name: Update search index
run: |
pnpm start:cli ingest files \
--file-roots "." \
--openai-api-key "${{ secrets.OPENAI_API_KEY }}" \
--db-path "./search-index.db"
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
`#### Script Integration
`bash
#!/bin/bash
update-docs.sh - Automated documentation indexing
Check if API key is set
if [ -z "$OPENAI_API_KEY" ]; then
echo "Please set OPENAI_API_KEY environment variable"
exit 1
fiIndex all project documentation
echo "Indexing project documentation..."
docsearch-mcp ingest files \
--file-roots "./src,./docs,./examples" \
--file-include-globs "*/.{ts,js,md,yaml,json}" \
--db-path "./project-index.db"Generate search summary
echo "Search index updated. Statistics:"
docsearch-mcp ingest-status --output json | jq '.documents'
`$3
- Auto (default): Hybrid keyword + semantic search
- Keyword: Full-text search only
- Vector: Semantic search only
$3
Run
make help to see all available commands:`bash
Development Commands
make install # Install dependencies
make dev # Start development server for MCP
make dev-ingest # Start development ingestion
make setup # Setup project for developmentBuild Commands
make build # Build the project
make clean # Clean all generated filesQuality Assurance
make lint # Run linter
make lint-fix # Run linter with auto-fix
make format # Format code with Prettier
make typecheck # Run TypeScript type checking
make test # Run tests
make test-run # Run tests once
make test-unit # Run unit tests only
make test-integration # Run integration tests (requires Docker)
make check-all # Run all quality checksProduction Commands
make start # Start production MCP server
make start-ingest # Start production ingestionData Management
make ingest-files # Ingest local files
make ingest-confluence # Ingest Confluence pages
make watch # Watch for file changes and re-index
make clean-data # Clean data directory
`$3
`bash
CLI Tool
pnpm dev:cli # CLI development mode
pnpm start:cli # CLI production mode (after build)Ingestion (legacy scripts)
pnpm dev:ingest files # Index local files
pnpm dev:ingest confluence # Index Confluence pages
pnpm dev:ingest watch # Watch for file changesMCP Server
pnpm dev:mcp # Development server
pnpm start:mcp # Production serverQuality
pnpm lint # Run linter
pnpm test # Run tests
pnpm test:unit # Run unit tests only
pnpm test:integration # Run integration tests (requires Docker)
pnpm build # Build project
`🏗️ Architecture
The system follows a clean ports and adapters (hexagonal) architecture:
`text
┌─────────────────┐ ┌─────────────────┐
│ Local Files │ │ Confluence │
└─────────┬───────┘ └─────────┬───────┘
│ │
▼ ▼
┌─────────────────────────────────┐
│ Ingestion Engine │
│ • Content extraction │
│ • Smart chunking │
│ • Embedding generation │
└─────────────┬───────────────────┘
│
▼
┌─────────────────────────────────┐
│ Database Layer │
│ SQLite (sqlite-vec) OR │
│ PostgreSQL (pgvector) │
│ • Document metadata │
│ • Text chunks │
│ • Vector embeddings │
│ • Full-text search index │
└─────────────┬───────────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ CLI Tool │ │ MCP Server │
│ • Text output │ │ • Tool calls │
│ • JSON output │ │ • Resources │
│ • YAML output │ │ • Claude integration │
│ • Config mgmt │ │ • Ingestion tools │
└─────────────────┘ └─────────────────┘
`$3
- Domain Layer: Core interfaces and business logic
- Adapters: Configuration, database, and output formatting
- CLI: Command-line interface with multiple output formats
- MCP Server: Model Context Protocol integration with Claude Code
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
$3
1. Fork the project
2. Create your feature branch (
git checkout -b feature/AmazingFeature)
3. Set up the development environment:
`bash
make setup
`4. Make your changes and ensure quality:
`bash
make check-all # Run linter, typecheck, and tests
make format # Format your code
`5. Commit your changes (
git commit -m 'Add some AmazingFeature')
6. Push to the branch (git push origin feature/AmazingFeature`)This project is licensed under the Apache License Version 2 - see the LICENSE file for details.
- Model Context Protocol for the integration standard
- sqlite-vec for vector search capabilities
- Claude Code for the AI-powered development experience