docsearch-mcp

![TypeScript](https://www.typescriptlang.org/)
![Node.js](https://nodejs.org/)
![MCP](https://modelcontextprotocol.io/)

A local-first document search and indexing system that provides hybrid semantic + keyword search across local files (including PDFs) and Confluence pages through the Model Context Protocol (MCP). Perfect for AI assistants like Claude Code/Desktop to access your documentation, codebase, and research materials.

✨ Features

- 🔍 Hybrid Search: Combines full-text search (FTS) with vector similarity for optimal results
- 📁 Multi-Source: Index local files (code, docs, PDFs) and Confluence spaces
- 📄 PDF Support: Extract and search text from PDF documents with metadata preservation
- 🖼️ Image Search: AI-powered image description and search for diagrams, screenshots, and charts
- 🗄️ Database Flexibility: Support for SQLite (local-first) and PostgreSQL (scalable)
- 🤖 MCP Integration: Seamless integration with Claude Code and other MCP-compatible tools
- 💻 CLI Tool: Standalone command-line interface with multiple output formats
- ⚡ Real-time Updates: File watching with automatic re-indexing
- 🎯 Smart Chunking: Intelligent text chunking for code, documentation, and PDFs
- 📊 Multiple Output Formats: Text, JSON, and YAML output for search results
- 🔒 Secure: API keys and sensitive data stay on your machine

🚀 Installation & Usage

$3

``bash

`Install globally`


npm install -g docsearch-mcp
Or use with npx

npx docsearch-mcp --help

$3

`bash

`Pull the image`


docker pull ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
Run the MCP server with proper permissions

docker run --rm -i \
  -v /path/to/your/documents:/app/documents \
  -v /path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2


$3
Add to your MCP client configuration:

`json { "mcpServers": { "docsearch": { "command": "npx", "args": ["docsearch-mcp", "start"] } } }`

`$3`

For development or customization:

`bash

`Clone the repository`


git clone https://github.com/patrickkoss/docsearch-mcp.git
cd docsearch-mcp
Quick setup with Make

make setup
Or manual setup

pnpm install
cp .env.example .env
Edit .env with your API keys and configuration


$3
#### CLI Tool

`bash

`Ingest documents`


docsearch-mcp ingest files              # Index local files
docsearch-mcp ingest confluence         # Index Confluence pages
docsearch-mcp ingest all --watch        # Index all sources with file watching
Search documents

docsearch-mcp search "your query"       # Basic search
docsearch-mcp search "typescript" -k 5 -o json  # JSON output, top 5 results
docsearch-mcp search "API docs" -s confluence   # Search only Confluence
Get help

docsearch-mcp --help
docsearch-mcp search --help


#### MCP Server

`bash

`Run the MCP server directly`


npx docsearch-mcp start
Or if installed globally

docsearch-mcp start


#### Docker Usage

`bash

`Run with official Docker image (recommended)`


docker run --rm -i \
  -v /path/to/your/documents:/app/documents \
  -v /path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
Run CLI commands with Docker

docker run --rm -i \
  -v /path/to/your/documents:/app/documents \
  -v /path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2 \
  npx docsearch-mcp search "your query"
Alternative: Use Docker volumes (simpler but less control)

docker run --rm \
  -v ./documents:/app/documents \
  -v docsearch-data:/app/data \
  --env-file .env \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
For development (requires cloning repo)

docker-compose up -d docsearch-mcp


⚙️ Configuration
$3

When using the official Docker image, create a .env file:

`bash echo "OPENAI_API_KEY=your-openai-key" > .env echo "FILE_ROOTS=/app/documents" >> .env`

Key considerations for Docker deployment:

- Document Volume: Mount your documents directory to /app/documentsin the container - Data Persistence: Mount your data directory to/app/datato persist the SQLite database - User Permissions: Use--user $(id -u):$(id -g)to avoid permission issues with mounted volumes - Environment Variables: Pass configuration via.envfile or environment variables - Official Image: Useghcr.io/patrickkoss/docsearch-mcp:v0.0.2 for production deployments

#### Using the Official Docker Image

Recommended approach with proper permissions:

`bash

`Basic usage with user permissions (avoids file ownership issues)`


docker run --rm -i \
  -v /absolute/path/to/documents:/app/documents \
  -v /absolute/path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.5
For MCP server integration (background service)

docker run -d --name docsearch-mcp \
  -v /absolute/path/to/documents:/app/documents \
  -v /absolute/path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.5


Alternative approach with Docker volumes:

`bash

`Using Docker volumes (simpler but less direct file access)`


docker run --rm \
  -v ./documents:/app/documents \
  -v docsearch-data:/app/data \
  --env-file .env \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2


Docker Permissions Explained:

- --user $(id -u):$(id -g): Runs the container with your user ID and group ID, preventing file permission issues - Dynamic User Creation: The container automatically creates a user matching your mounted volume permissions - Automatic File Ownership: Container files are automatically owned by the correct user - Cross-Platform Compatibility: Works consistently across different host systems

`$3`

Create a .env file from .env.example and configure:

`$3`

`env EMBEDDINGS_PROVIDER=openai OPENAI_API_KEY=your_openai_key OPENAI_EMBED_MODEL=text-embedding-3-small`

`$3`

`env FILE_ROOTS=. FILE_INCLUDE_GLOBS=*/.{ts,js,py,md,txt,pdf,png,jpg,jpeg,gif,svg,webp} FILE_EXCLUDE_GLOBS=/node_modules/,/dist/`

`$3`

`env ENABLE_IMAGE_TO_TEXT=true IMAGE_TO_TEXT_PROVIDER=openai IMAGE_TO_TEXT_MODEL=gpt-4o-mini`

`$3`

`env

`Use SQLite (default, local-first)`


DB_TYPE=sqlite
DB_PATH=./data/index.db
OR use PostgreSQL (for scalability)

DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/docsearch

$3

`env CONFLUENCE_BASE_URL=https://yourcompany.atlassian.net CONFLUENCE_EMAIL=your@email.com CONFLUENCE_API_TOKEN=your_confluence_token CONFLUENCE_SPACES=SPACE1,SPACE2

`Advanced filtering (all optional)`


Filter by parent page IDs - only index pages under these parent pages

CONFLUENCE_PARENT_PAGES=12345678,87654321
Title patterns to include (supports * wildcards)

CONFLUENCE_TITLE_INCLUDES=API,User Guide,Documentation*
Title patterns to exclude (supports * wildcards)

CONFLUENCE_TITLE_EXCLUDES=Draft,Archive,Test


Confluence Filtering Options:

- Parent Page Filtering: Specify page IDs to only index pages within specific folder hierarchies - Title Include Patterns: Define patterns to only index pages with matching titles - Title Exclude Patterns: Define patterns to skip pages with matching titles - Wildcard Support: Use for flexible pattern matching (e.g., API matches any title starting with "API")

`$3`

The system automatically detects and processes different file types:

- Code files: .ts, .js, .py, .go, .rs, .java, .cpp, .c, .rb, .php, .kt, .swift- Documentation:.md, .mdx, .txt, .rst, .adoc, .yaml, .yml, .json- PDFs:.pdffiles are automatically parsed with text extraction and metadata preservation - Images:.png, .jpg, .jpeg, .gif, .svg, .webp files with optional AI-powered description generation

PDF files are processed with:

- Text extraction using advanced parsing - Metadata preservation (page count, document info) - Smart text normalization and chunking - Error handling for corrupted or encrypted files

Image files support optional AI-powered description generation:

- Vision Model Integration: Uses OpenAI vision models (or compatible APIs) to generate detailed descriptions - Configurable Processing: Enable/disable image-to-text viaENABLE_IMAGE_TO_TEXTenvironment variable - Fallback Support: Images are indexed by filename when vision processing is disabled - Metadata Preservation: Stores image dimensions, file size, and description metadata - Search Integration: Image descriptions are indexed as searchable text content

`📖 Usage`

`$3`

The CLI tool provides a convenient way to ingest and search documents directly from the command line:

#### CLI Commands & Flags

All CLI flags support configuration precedence: CLI args > Environment variables > Config file

##### Global Options (Available for all commands)

| Flag | Environment Variable | Default | Description | | --------------------------------------- | ---------------------------- | ------------------------------------------------------------- | ---------------------------------------- | |-c, --config | - | .env| Configuration file path | |--embeddings-provider | EMBEDDINGS_PROVIDER | openai | Embeddings provider (openai or tei) | |--openai-api-key | OPENAI_API_KEY| - | OpenAI API key | |--openai-base-url | OPENAI_BASE_URL| - | OpenAI base URL (for custom endpoints) | |--openai-embed-model | OPENAI_EMBED_MODEL | text-embedding-3-small| OpenAI embedding model | |--openai-embed-dim | OPENAI_EMBED_DIM | 1536| OpenAI embedding dimension | |--tei-endpoint | TEI_ENDPOINT| - | Text Embeddings Inference endpoint | |--enable-image-to-text | ENABLE_IMAGE_TO_TEXT | false| Enable image-to-text processing | |--image-to-text-provider | IMAGE_TO_TEXT_PROVIDER | openai| Image-to-text provider | |--image-to-text-model | IMAGE_TO_TEXT_MODEL | gpt-4o-mini| Image-to-text model | |--confluence-base-url | CONFLUENCE_BASE_URL| - | Confluence base URL | |--confluence-email | CONFLUENCE_EMAIL| - | Confluence email | |--confluence-api-token | CONFLUENCE_API_TOKEN| - | Confluence API token | |--confluence-spaces | CONFLUENCE_SPACES| - | Confluence spaces (comma-separated) | |--file-roots | FILE_ROOTS | .| File roots to index (comma-separated) | |--file-include-globs | FILE_INCLUDE_GLOBS | */.{go,ts,tsx,js,py,rs,java,md,mdx,txt,yaml,yml,json,pdf}| File include patterns | |--file-exclude-globs | FILE_EXCLUDE_GLOBS | /{.git,node_modules,dist,build,target}/| File exclude patterns | |--db-type | DB_TYPE | sqlite | Database type (sqlite or postgresql) | |--db-path | DB_PATH | ./data/index.db| SQLite database path | |--postgres-connection-string | POSTGRES_CONNECTION_STRING | - | PostgreSQL connection string |

##### Ingest Command

`bash docsearch ingest [source] [options]`

Arguments:

- source - Source to ingest: files, confluence, or all (default: all)

Options:

- -w, --watch - Watch for file changes and re-index

Examples:

`bash

`Basic ingestion with environment config`


docsearch ingest files
docsearch ingest confluence
docsearch ingest all --watch
Override config with CLI args

docsearch ingest files --file-roots "./src,./docs" --openai-api-key sk-xxx
Use custom config file and database

docsearch --config prod.env --db-path /data/prod.db ingest all
Ingest with custom file patterns

docsearch ingest files \
  --file-include-globs "*/.{ts,md,py}" \
  --file-exclude-globs "/node_modules/,/venv/"
Use PostgreSQL instead of SQLite

docsearch ingest all \
  --db-type postgresql \
  --postgres-connection-string "postgresql://user:pass@localhost:5432/docs"


##### Search Command

`bash docsearch search [options]`

Arguments:

- query - Search query (required)

Options:

| Flag | Default | Description | | ---------------------------- | ------- | ------------------------------------------- | |-k, --top-k | 10| Number of results to return (1-100) | |-s, --source | - | Filter by source: file or confluence| |-r, --repo | - | Filter by repository name | |-p, --path-prefix | - | Filter by path prefix | |-m, --mode | auto | Search mode: auto, vector, or keyword| |-o, --output | text | Output format: text, json, or yaml| |--include-images| - | Include images in search results | |--images-only | - | Search only images |

Examples:

`bash

`Basic search`


docsearch search "typescript interface"
Limit results and format as JSON

docsearch search "API documentation" --top-k 5 --output json
Search only Confluence with repository filter

docsearch search "deployment guide" --source confluence --repo backend-docs
Semantic search only with path filtering

docsearch search "authentication" --mode vector --path-prefix src/auth/
Override database and output as YAML

docsearch --db-path /custom/index.db search "error handling" --output yaml
Complex search with multiple filters

docsearch search "React hooks" \
  --source file \
  --repo frontend \
  --path-prefix src/components/ \
  --mode auto \
  --top-k 20 \
  --output json
Search including images (architecture diagrams, screenshots)

docsearch search "user authentication flow" --include-images
Search only images for diagrams and charts

docsearch search "system architecture" --images-only --output json


#### CLI Configuration
The CLI supports multiple configuration sources in order of precedence:

1. Command-line arguments (highest priority) 2. Custom config file (--config path/to/.env) 3..env.localfile 4..env file (lowest priority)

#### CLI Output Formats

- Text (default): Human-readable with icons and formatting - JSON: Structured data with timestamps for integration - YAML: Clean hierarchical format for configuration or documentation

`$3`

The MCP (Model Context Protocol) server provides seamless integration with Claude Code and other MCP-compatible tools.

#### MCP Configuration Options

All configuration options are passed via environment variables to the MCP server:

##### Required Configuration

| Environment Variable | Default | Description | | --------------------- | -------- | ----------------------------------------- | |EMBEDDINGS_PROVIDER | openai | Embeddings provider (openai or tei) | |OPENAI_API_KEY | - | OpenAI API key (required if using OpenAI) |

##### Optional Configuration

| Environment Variable | Default | Description | | ---------------------------- | ------------------------------------------------------------- | ---------------------------------------- | |OPENAI_BASE_URL| - | OpenAI base URL (for custom endpoints) | |OPENAI_EMBED_MODEL | text-embedding-3-small| OpenAI embedding model | |OPENAI_EMBED_DIM | 1536| OpenAI embedding dimension | |TEI_ENDPOINT| - | Text Embeddings Inference endpoint | |FILE_ROOTS | .| File roots to index (comma-separated) | |FILE_INCLUDE_GLOBS | */.{go,ts,tsx,js,py,rs,java,md,mdx,txt,yaml,yml,json,pdf}| File include patterns | |FILE_EXCLUDE_GLOBS | /{.git,node_modules,dist,build,target}/| File exclude patterns | |CONFLUENCE_BASE_URL| - | Confluence base URL | |CONFLUENCE_EMAIL| - | Confluence email | |CONFLUENCE_API_TOKEN| - | Confluence API token | |CONFLUENCE_SPACES| - | Confluence spaces (comma-separated) | |DB_TYPE | sqlite | Database type (sqlite or postgresql) | |DB_PATH | ./data/index.db| SQLite database path | |POSTGRES_CONNECTION_STRING | - | PostgreSQL connection string |

#### Quick Setup

1. Install the package:

`bash npm install -g docsearch-mcp`

2. Create your configuration:

`bash # Create a .env file in your project directory echo "OPENAI_API_KEY=your-openai-key" > .env echo "FILE_ROOTS=." >> .env`

3. Add to Claude Code MCP settings:

`json { "mcpServers": { "docsearch": { "command": "npx", "args": ["docsearch-mcp", "start"], "env": { "OPENAI_API_KEY": "your-openai-key", "EMBEDDINGS_PROVIDER": "openai", "FILE_ROOTS": ".,../other-project", "DB_PATH": "/path/to/your/index.db" } } } }`

#### Docker Integration

##### Method 1: Direct Docker Run

`json { "mcpServers": { "docsearch": { "command": "docker", "args": [ "run", "--rm", "-i", "-v", "/path/to/your/documents:/app/documents", "-v", "docsearch-data:/app/data", "--env-file", ".env", "ghcr.io/patrickkoss/docsearch-mcp:v0.0.2" ] } } }`

##### Method 2: Using Docker Compose

Create a docker-compose.yml file:

`yaml version: '3.8' services: docsearch-mcp: image: ghcr.io/patrickkoss/docsearch-mcp:v0.0.5 volumes: - ./documents:/app/documents - docsearch-data:/app/data env_file: - .env stdin_open: true volumes: docsearch-data:`

Then configure MCP:

`json { "mcpServers": { "docsearch": { "command": "docker", "args": ["exec", "-i", "docsearch-mcp", "node", "dist/src/server/mcp.js"] } } }`

#### Advanced MCP Configuration Examples

##### Multi-Project Setup

`json { "mcpServers": { "docsearch-work": { "command": "npx", "args": ["docsearch-mcp", "start"], "env": { "OPENAI_API_KEY": "sk-your-key", "FILE_ROOTS": "/work/projects/frontend,/work/projects/backend", "FILE_INCLUDE_GLOBS": "*/.{ts,tsx,js,py,md,yaml}", "DB_PATH": "/work/data/work-index.db", "CONFLUENCE_BASE_URL": "https://company.atlassian.net", "CONFLUENCE_SPACES": "DEV,API,DOCS" } }, "docsearch-personal": { "command": "npx", "args": ["docsearch-mcp", "start"], "env": { "OPENAI_API_KEY": "sk-your-key", "FILE_ROOTS": "/home/user/projects,/home/user/documents", "DB_PATH": "/home/user/.docsearch/personal.db", "EMBEDDINGS_PROVIDER": "tei", "TEI_ENDPOINT": "http://localhost:8080/embeddings" } } } }`

##### PostgreSQL Setup

`json { "mcpServers": { "docsearch": { "command": "npx", "args": ["docsearch-mcp", "start"], "env": { "OPENAI_API_KEY": "sk-your-key", "DB_TYPE": "postgresql", "POSTGRES_CONNECTION_STRING": "postgresql://user:pass@localhost:5432/docsearch", "FILE_ROOTS": ".,/other/projects" } } } }`

#### MCP Tools

The MCP server provides these tools for Claude Code:

##### doc-search

Search indexed documents with optional output formatting.

Parameters:

- query(string, required): Search query -topK(number, optional): Number of results to return (default: 10, max: 100) -source (string, optional): Filter by source (file or confluence) -repo(string, optional): Filter by repository name -pathPrefix(string, optional): Filter by path prefix -mode (string, optional): Search mode (auto, vector, keyword, default: auto) -outputFormat (string, optional): Output format (text, json, yaml, default: text) -includeImages(boolean, optional): Include images in search results -imagesOnly (boolean, optional): Search only images

Example:

`text Use the doc-search tool to find "TypeScript interfaces" with JSON output format, limited to 5 results from file sources only.`

##### doc-ingest

Ingest documents from files or Confluence.

Parameters:

- source (string, required): Source to ingest (files, confluence, or all) -watch (boolean, optional): Watch for file changes and re-index (default: false)

Example:

`text Use the doc-ingest tool to index all local files and Confluence pages.`

##### doc-ingest-status

Get indexing statistics and status.

No parameters required.

Example:

`text Use the doc-ingest-status tool to check how many documents are currently indexed.`

#### MCP Resources

The server also provides chunk resources via URIs:

- docchunk://{chunk_id}: Retrieve full content of a specific document chunk by ID

`📚 Practical Examples`

`$3`

#### Quick Setup Workflow

`bash

`1. Install globally`


npm install -g docsearch-mcp
2. Configure with your API key

echo "OPENAI_API_KEY=sk-your-key-here" > .env
echo "FILE_ROOTS=.,../other-project" >> .env
3. Index your documents

docsearch-mcp ingest all --watch &
4. Test search

docsearch-mcp search "how to deploy" --output json --top-k 3
5. Test MCP server

npx docsearch-mcp start


#### Production Docker Workflow

`bash

`1. Setup production environment`


echo "OPENAI_API_KEY=sk-your-key" > .env
echo "FILE_ROOTS=/app/documents" >> .env
Add other configuration as needed
2. Create documents directory

mkdir -p documents
cp -r /path/to/your/docs/* documents/
3. Run the official Docker image

docker run -d \
  --name docsearch-mcp \
  -v $(pwd)/documents:/app/documents \
  -v docsearch-data:/app/data \
  --env-file .env \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.5
4. Index documents

docker exec docsearch-mcp npx docsearch-mcp ingest all
5. Test search

docker exec docsearch-mcp npx docsearch-mcp search "authentication" -o json


$3
#### Code Documentation Search
Setup for a TypeScript/React Project:

`bash

`CLI configuration`


docsearch-mcp ingest files \
  --file-roots "./src,./docs,./README.md" \
  --file-include-globs "*/.{ts,tsx,md,yaml,json}" \
  --file-exclude-globs "/node_modules/,/dist/,/.git/"
Search for React patterns

docsearch-mcp search "useEffect hook patterns" --source file --top-k 10
docsearch-mcp search "TypeScript interface" --mode vector --output json


MCP Configuration:

`json { "mcpServers": { "project-docs": { "command": "npx", "args": ["docsearch-mcp", "start"], "env": { "OPENAI_API_KEY": "sk-your-key", "FILE_ROOTS": "./src,./docs,./README.md", "FILE_INCLUDE_GLOBS": "*/.{ts,tsx,md,yaml,json}", "FILE_EXCLUDE_GLOBS": "/node_modules/,/dist/", "DB_PATH": "./project-search.db" } } } }`

#### Multi-Repository Setup

Team Documentation Hub:

`json { "mcpServers": { "team-knowledge": { "command": "npx", "args": ["docsearch-mcp", "start"], "env": { "OPENAI_API_KEY": "sk-your-key", "FILE_ROOTS": "/team/frontend,/team/backend,/team/mobile,/team/docs", "CONFLUENCE_BASE_URL": "https://company.atlassian.net", "CONFLUENCE_SPACES": "TEAM,API,ARCH,DEPLOY", "CONFLUENCE_EMAIL": "team@company.com", "CONFLUENCE_API_TOKEN": "your-token", "DB_TYPE": "postgresql", "POSTGRES_CONNECTION_STRING": "postgresql://user:pass@localhost/team_docs" } } } }`

Personal Research Setup:

`bash

`Index academic papers and personal notes`


docsearch-mcp ingest files \
  --file-roots "/home/user/papers,/home/user/notes,/home/user/projects" \
  --file-include-globs "*/.{pdf,md,txt,py,ipynb}" \
  --embeddings-provider tei \
  --tei-endpoint "http://localhost:8080/embeddings"
Search across all personal knowledge

docsearch-mcp search "machine learning optimization" --mode auto --top-k 15
Search for diagrams and technical images

docsearch-mcp search "neural network architecture" --include-images --top-k 10
docsearch-mcp search "flowchart" --images-only --output json


#### PDF-Heavy Workflow

`bash

`Setup for research document collection`


docsearch-mcp ingest files \
  --file-roots "/research/papers,/research/reports" \
  --file-include-globs "*/.{pdf,docx,md,txt}" \
  --openai-embed-model "text-embedding-3-large" \
  --openai-embed-dim 3072
Search through research papers

docsearch-mcp search "neural network architectures" \
  --mode vector \
  --top-k 20 \
  --output yaml


$3
#### Complex Filtering

`bash

`Search only TypeScript files in components directory`


docsearch-mcp search "button component" \
  --source file \
  --path-prefix src/components/ \
  --mode auto
Find Confluence deployment docs from specific space

docsearch-mcp search "kubernetes deployment" \
  --source confluence \
  --repo DEVOPS \
  --mode keyword
Semantic search across all sources with high precision

docsearch-mcp search "error handling patterns" \
  --mode vector \
  --top-k 50 \
  --output json | jq '.results[] | select(.score > 0.8)'
Search for architecture diagrams and flowcharts

docsearch-mcp search "system design diagram" \
  --images-only \
  --mode vector \
  --output json
Include images when searching for UI/UX content

docsearch-mcp search "user interface design" \
  --include-images \
  --path-prefix docs/ \
  --top-k 20


#### Output Format Examples
Text Output (Human-readable):

`bash docsearch-mcp search "authentication middleware" --output text`

JSON Output (For scripts):

`bash

`Get results for further processing`


RESULTS=$(docsearch-mcp search "API rate limiting" --output json)
echo "$RESULTS" | jq '.results[].path' | head -5


YAML Output (For documentation):

`bash

`Generate documentation from search results`


docsearch-mcp search "configuration options" --output yaml > config-docs.yml


$3
#### Debug Search Quality

`bash

`Compare search modes`


docsearch-mcp search "async function" --mode keyword --top-k 5
docsearch-mcp search "async function" --mode vector --top-k 5
docsearch-mcp search "async function" --mode auto --top-k 5
Check what's indexed

docsearch-mcp ingest-status


#### Performance Optimization

`bash

`Use smaller embedding model for speed`


docsearch-mcp ingest all \
  --openai-embed-model "text-embedding-3-small" \
  --openai-embed-dim 1536
Use local TEI for privacy/speed

docker run -d --name tei -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 \
  --model-id sentence-transformers/all-MiniLM-L6-v2

docsearch-mcp ingest all \ --embeddings-provider tei \ --tei-endpoint http://localhost:8080/embeddings`

`$3`

#### CI/CD Pipeline Integration

`yaml

`.github/workflows/docs-index.yml`


name: Update Documentation Index
on:
  push:
    paths: ['docs/', 'src//*.md', 'README.md']

jobs: update-index: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 18 - name: Install dependencies run: npm install -g pnpm && pnpm install - name: Build docsearch run: pnpm build - name: Update search index run: | pnpm start:cli ingest files \ --file-roots "." \ --openai-api-key "${{ secrets.OPENAI_API_KEY }}" \ --db-path "./search-index.db" env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}`

#### Script Integration

`bash #!/bin/bash

`update-docs.sh - Automated documentation indexing`

`Check if API key is set`


if [ -z "$OPENAI_API_KEY" ]; then
    echo "Please set OPENAI_API_KEY environment variable"
    exit 1
fi
Index all project documentation

echo "Indexing project documentation..."
docsearch-mcp ingest files \
    --file-roots "./src,./docs,./examples" \
    --file-include-globs "*/.{ts,js,md,yaml,json}" \
    --db-path "./project-index.db"
Generate search summary

echo "Search index updated. Statistics:"
docsearch-mcp ingest-status --output json | jq '.documents'


$3
- Auto (default): Hybrid keyword + semantic search
- Keyword: Full-text search only
- Vector: Semantic search only
$3

Run make help to see all available commands:

`bash

`Development Commands`


make install                    # Install dependencies
make dev                        # Start development server for MCP
make dev-ingest                 # Start development ingestion
make setup                      # Setup project for development
Build Commands

make build                      # Build the project
make clean                      # Clean all generated files
Quality Assurance

make lint                       # Run linter
make lint-fix                   # Run linter with auto-fix
make format                     # Format code with Prettier
make typecheck                  # Run TypeScript type checking
make test                       # Run tests
make test-run                   # Run tests once
make test-unit                  # Run unit tests only
make test-integration           # Run integration tests (requires Docker)
make check-all                  # Run all quality checks
Production Commands

make start                      # Start production MCP server
make start-ingest               # Start production ingestion
Data Management

make ingest-files               # Ingest local files
make ingest-confluence          # Ingest Confluence pages
make watch                      # Watch for file changes and re-index
make clean-data                 # Clean data directory

$3

`bash

`CLI Tool`


pnpm dev:cli                    # CLI development mode
pnpm start:cli                  # CLI production mode (after build)
Ingestion (legacy scripts)

pnpm dev:ingest files           # Index local files
pnpm dev:ingest confluence      # Index Confluence pages
pnpm dev:ingest watch           # Watch for file changes
MCP Server

pnpm dev:mcp                    # Development server
pnpm start:mcp                  # Production server
Quality

pnpm lint                       # Run linter
pnpm test                       # Run tests
pnpm test:unit                  # Run unit tests only
pnpm test:integration           # Run integration tests (requires Docker)
pnpm build                      # Build project


🏗️ Architecture
The system follows a clean ports and adapters (hexagonal) architecture:

`text ┌─────────────────┐ ┌─────────────────┐ │ Local Files │ │ Confluence │ └─────────┬───────┘ └─────────┬───────┘ │ │ ▼ ▼ ┌─────────────────────────────────┐ │ Ingestion Engine │ │ • Content extraction │ │ • Smart chunking │ │ • Embedding generation │ └─────────────┬───────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ Database Layer │ │ SQLite (sqlite-vec) OR │ │ PostgreSQL (pgvector) │ │ • Document metadata │ │ • Text chunks │ │ • Vector embeddings │ │ • Full-text search index │ └─────────────┬───────────────────┘ │ ┌─────────┴─────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ CLI Tool │ │ MCP Server │ │ • Text output │ │ • Tool calls │ │ • JSON output │ │ • Resources │ │ • YAML output │ │ • Claude integration │ │ • Config mgmt │ │ • Ingestion tools │ └─────────────────┘ └─────────────────┘`

`$3`

- Domain Layer: Core interfaces and business logic - Adapters: Configuration, database, and output formatting - CLI: Command-line interface with multiple output formats - MCP Server: Model Context Protocol integration with Claude Code

`🤝 Contributing`

Contributions are welcome! Please feel free to submit a Pull Request.

`$3`

1. Fork the project 2. Create your feature branch (git checkout -b feature/AmazingFeature) 3. Set up the development environment:

`bash make setup`

4. Make your changes and ensure quality:

`bash make check-all # Run linter, typecheck, and tests make format # Format your code`

5. Commit your changes (git commit -m 'Add some AmazingFeature') 6. Push to the branch (git push origin feature/AmazingFeature`)
7. Open a Pull Request

📝 License

This project is licensed under the Apache License Version 2 - see the LICENSE file for details.

🙏 Acknowledgments

- Model Context Protocol for the integration standard
- sqlite-vec for vector search capabilities
- Claude Code for the AI-powered development experience

docsearch-mcp

![TypeScript](https://www.typescriptlang.org/)
![Node.js](https://nodejs.org/)
![MCP](https://modelcontextprotocol.io/)

✨ Features

🚀 Installation & Usage

$3

``bash

`Install globally`


npm install -g docsearch-mcp
Or use with npx

npx docsearch-mcp --help

$3

`bash

`Pull the image`


docker pull ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
Run the MCP server with proper permissions

docker run --rm -i \
  -v /path/to/your/documents:/app/documents \
  -v /path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2


$3
Add to your MCP client configuration:

`json { "mcpServers": { "docsearch": { "command": "npx", "args": ["docsearch-mcp", "start"] } } }`

`$3`

For development or customization:

`bash

`Clone the repository`


git clone https://github.com/patrickkoss/docsearch-mcp.git
cd docsearch-mcp
Quick setup with Make

make setup
Or manual setup

pnpm install
cp .env.example .env
Edit .env with your API keys and configuration


$3
#### CLI Tool

`bash

`Ingest documents`


docsearch-mcp ingest files              # Index local files
docsearch-mcp ingest confluence         # Index Confluence pages
docsearch-mcp ingest all --watch        # Index all sources with file watching
Search documents

docsearch-mcp search "your query"       # Basic search
docsearch-mcp search "typescript" -k 5 -o json  # JSON output, top 5 results
docsearch-mcp search "API docs" -s confluence   # Search only Confluence
Get help

docsearch-mcp --help
docsearch-mcp search --help


#### MCP Server

`bash

`Run the MCP server directly`


npx docsearch-mcp start
Or if installed globally

docsearch-mcp start


#### Docker Usage

`bash

`Run with official Docker image (recommended)`


docker run --rm -i \
  -v /path/to/your/documents:/app/documents \
  -v /path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
Run CLI commands with Docker

docker run --rm -i \
  -v /path/to/your/documents:/app/documents \
  -v /path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2 \
  npx docsearch-mcp search "your query"
Alternative: Use Docker volumes (simpler but less control)

docker run --rm \
  -v ./documents:/app/documents \
  -v docsearch-data:/app/data \
  --env-file .env \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2
For development (requires cloning repo)

docker-compose up -d docsearch-mcp


⚙️ Configuration
$3

When using the official Docker image, create a .env file:

`bash echo "OPENAI_API_KEY=your-openai-key" > .env echo "FILE_ROOTS=/app/documents" >> .env`

Key considerations for Docker deployment:

#### Using the Official Docker Image

Recommended approach with proper permissions:

`bash

`Basic usage with user permissions (avoids file ownership issues)`


docker run --rm -i \
  -v /absolute/path/to/documents:/app/documents \
  -v /absolute/path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.5
For MCP server integration (background service)

docker run -d --name docsearch-mcp \
  -v /absolute/path/to/documents:/app/documents \
  -v /absolute/path/to/data:/app/data \
  --env-file .env \
  --user $(id -u):$(id -g) \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.5


Alternative approach with Docker volumes:

`bash

`Using Docker volumes (simpler but less direct file access)`


docker run --rm \
  -v ./documents:/app/documents \
  -v docsearch-data:/app/data \
  --env-file .env \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.2


Docker Permissions Explained:

`$3`

Create a .env file from .env.example and configure:

`$3`

`env EMBEDDINGS_PROVIDER=openai OPENAI_API_KEY=your_openai_key OPENAI_EMBED_MODEL=text-embedding-3-small`

`$3`

`env FILE_ROOTS=. FILE_INCLUDE_GLOBS=*/.{ts,js,py,md,txt,pdf,png,jpg,jpeg,gif,svg,webp} FILE_EXCLUDE_GLOBS=/node_modules/,/dist/`

`$3`

`env ENABLE_IMAGE_TO_TEXT=true IMAGE_TO_TEXT_PROVIDER=openai IMAGE_TO_TEXT_MODEL=gpt-4o-mini`

`$3`

`env

`Use SQLite (default, local-first)`


DB_TYPE=sqlite
DB_PATH=./data/index.db
OR use PostgreSQL (for scalability)

DB_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/docsearch

$3

`env CONFLUENCE_BASE_URL=https://yourcompany.atlassian.net CONFLUENCE_EMAIL=your@email.com CONFLUENCE_API_TOKEN=your_confluence_token CONFLUENCE_SPACES=SPACE1,SPACE2

`Advanced filtering (all optional)`


Filter by parent page IDs - only index pages under these parent pages

CONFLUENCE_PARENT_PAGES=12345678,87654321
Title patterns to include (supports * wildcards)

CONFLUENCE_TITLE_INCLUDES=API,User Guide,Documentation*
Title patterns to exclude (supports * wildcards)

CONFLUENCE_TITLE_EXCLUDES=Draft,Archive,Test


Confluence Filtering Options:

`$3`

The system automatically detects and processes different file types:

PDF files are processed with:

- Text extraction using advanced parsing - Metadata preservation (page count, document info) - Smart text normalization and chunking - Error handling for corrupted or encrypted files

Image files support optional AI-powered description generation:

`📖 Usage`

`$3`

The CLI tool provides a convenient way to ingest and search documents directly from the command line:

#### CLI Commands & Flags

All CLI flags support configuration precedence: CLI args > Environment variables > Config file

##### Global Options (Available for all commands)

##### Ingest Command

`bash docsearch ingest [source] [options]`

Arguments:

- source - Source to ingest: files, confluence, or all (default: all)

Options:

- -w, --watch - Watch for file changes and re-index

Examples:

`bash

`Basic ingestion with environment config`


docsearch ingest files
docsearch ingest confluence
docsearch ingest all --watch
Override config with CLI args

docsearch ingest files --file-roots "./src,./docs" --openai-api-key sk-xxx
Use custom config file and database

docsearch --config prod.env --db-path /data/prod.db ingest all
Ingest with custom file patterns

docsearch ingest files \
  --file-include-globs "*/.{ts,md,py}" \
  --file-exclude-globs "/node_modules/,/venv/"
Use PostgreSQL instead of SQLite

docsearch ingest all \
  --db-type postgresql \
  --postgres-connection-string "postgresql://user:pass@localhost:5432/docs"


##### Search Command

`bash docsearch search [options]`

Arguments:

- query - Search query (required)

Options:

Examples:

`bash

`Basic search`


docsearch search "typescript interface"
Limit results and format as JSON

docsearch search "API documentation" --top-k 5 --output json
Search only Confluence with repository filter

docsearch search "deployment guide" --source confluence --repo backend-docs
Semantic search only with path filtering

docsearch search "authentication" --mode vector --path-prefix src/auth/
Override database and output as YAML

docsearch --db-path /custom/index.db search "error handling" --output yaml
Complex search with multiple filters

docsearch search "React hooks" \
  --source file \
  --repo frontend \
  --path-prefix src/components/ \
  --mode auto \
  --top-k 20 \
  --output json
Search including images (architecture diagrams, screenshots)

docsearch search "user authentication flow" --include-images
Search only images for diagrams and charts

docsearch search "system architecture" --images-only --output json


#### CLI Configuration
The CLI supports multiple configuration sources in order of precedence:

1. Command-line arguments (highest priority) 2. Custom config file (--config path/to/.env) 3..env.localfile 4..env file (lowest priority)

#### CLI Output Formats

- Text (default): Human-readable with icons and formatting - JSON: Structured data with timestamps for integration - YAML: Clean hierarchical format for configuration or documentation

`$3`

The MCP (Model Context Protocol) server provides seamless integration with Claude Code and other MCP-compatible tools.

#### MCP Configuration Options

All configuration options are passed via environment variables to the MCP server:

##### Required Configuration

##### Optional Configuration

#### Quick Setup

1. Install the package:

`bash npm install -g docsearch-mcp`

2. Create your configuration:

`bash # Create a .env file in your project directory echo "OPENAI_API_KEY=your-openai-key" > .env echo "FILE_ROOTS=." >> .env`

3. Add to Claude Code MCP settings:

#### Docker Integration

##### Method 1: Direct Docker Run

##### Method 2: Using Docker Compose

Create a docker-compose.yml file:

Then configure MCP:

`json { "mcpServers": { "docsearch": { "command": "docker", "args": ["exec", "-i", "docsearch-mcp", "node", "dist/src/server/mcp.js"] } } }`

#### Advanced MCP Configuration Examples

##### Multi-Project Setup

##### PostgreSQL Setup

#### MCP Tools

The MCP server provides these tools for Claude Code:

##### doc-search

Search indexed documents with optional output formatting.

Parameters:

Example:

`text Use the doc-search tool to find "TypeScript interfaces" with JSON output format, limited to 5 results from file sources only.`

##### doc-ingest

Ingest documents from files or Confluence.

Parameters:

- source (string, required): Source to ingest (files, confluence, or all) -watch (boolean, optional): Watch for file changes and re-index (default: false)

Example:

`text Use the doc-ingest tool to index all local files and Confluence pages.`

##### doc-ingest-status

Get indexing statistics and status.

No parameters required.

Example:

`text Use the doc-ingest-status tool to check how many documents are currently indexed.`

#### MCP Resources

The server also provides chunk resources via URIs:

- docchunk://{chunk_id}: Retrieve full content of a specific document chunk by ID

`📚 Practical Examples`

`$3`

#### Quick Setup Workflow

`bash

`1. Install globally`


npm install -g docsearch-mcp
2. Configure with your API key

echo "OPENAI_API_KEY=sk-your-key-here" > .env
echo "FILE_ROOTS=.,../other-project" >> .env
3. Index your documents

docsearch-mcp ingest all --watch &
4. Test search

docsearch-mcp search "how to deploy" --output json --top-k 3
5. Test MCP server

npx docsearch-mcp start


#### Production Docker Workflow

`bash

`1. Setup production environment`


echo "OPENAI_API_KEY=sk-your-key" > .env
echo "FILE_ROOTS=/app/documents" >> .env
Add other configuration as needed
2. Create documents directory

mkdir -p documents
cp -r /path/to/your/docs/* documents/
3. Run the official Docker image

docker run -d \
  --name docsearch-mcp \
  -v $(pwd)/documents:/app/documents \
  -v docsearch-data:/app/data \
  --env-file .env \
  ghcr.io/patrickkoss/docsearch-mcp:v0.0.5
4. Index documents

docker exec docsearch-mcp npx docsearch-mcp ingest all
5. Test search

docker exec docsearch-mcp npx docsearch-mcp search "authentication" -o json


$3
#### Code Documentation Search
Setup for a TypeScript/React Project:

`bash

`CLI configuration`


docsearch-mcp ingest files \
  --file-roots "./src,./docs,./README.md" \
  --file-include-globs "*/.{ts,tsx,md,yaml,json}" \
  --file-exclude-globs "/node_modules/,/dist/,/.git/"
Search for React patterns

docsearch-mcp search "useEffect hook patterns" --source file --top-k 10
docsearch-mcp search "TypeScript interface" --mode vector --output json


MCP Configuration:

#### Multi-Repository Setup

Team Documentation Hub:

Personal Research Setup:

`bash

`Index academic papers and personal notes`


docsearch-mcp ingest files \
  --file-roots "/home/user/papers,/home/user/notes,/home/user/projects" \
  --file-include-globs "*/.{pdf,md,txt,py,ipynb}" \
  --embeddings-provider tei \
  --tei-endpoint "http://localhost:8080/embeddings"
Search across all personal knowledge

docsearch-mcp search "machine learning optimization" --mode auto --top-k 15
Search for diagrams and technical images

docsearch-mcp search "neural network architecture" --include-images --top-k 10
docsearch-mcp search "flowchart" --images-only --output json


#### PDF-Heavy Workflow

`bash

`Setup for research document collection`


docsearch-mcp ingest files \
  --file-roots "/research/papers,/research/reports" \
  --file-include-globs "*/.{pdf,docx,md,txt}" \
  --openai-embed-model "text-embedding-3-large" \
  --openai-embed-dim 3072
Search through research papers

docsearch-mcp search "neural network architectures" \
  --mode vector \
  --top-k 20 \
  --output yaml


$3
#### Complex Filtering

`bash

`Search only TypeScript files in components directory`


docsearch-mcp search "button component" \
  --source file \
  --path-prefix src/components/ \
  --mode auto
Find Confluence deployment docs from specific space

docsearch-mcp search "kubernetes deployment" \
  --source confluence \
  --repo DEVOPS \
  --mode keyword
Semantic search across all sources with high precision

docsearch-mcp search "error handling patterns" \
  --mode vector \
  --top-k 50 \
  --output json | jq '.results[] | select(.score > 0.8)'
Search for architecture diagrams and flowcharts

docsearch-mcp search "system design diagram" \
  --images-only \
  --mode vector \
  --output json
Include images when searching for UI/UX content

docsearch-mcp search "user interface design" \
  --include-images \
  --path-prefix docs/ \
  --top-k 20


#### Output Format Examples
Text Output (Human-readable):

`bash docsearch-mcp search "authentication middleware" --output text`

JSON Output (For scripts):

`bash

`Get results for further processing`


RESULTS=$(docsearch-mcp search "API rate limiting" --output json)
echo "$RESULTS" | jq '.results[].path' | head -5


YAML Output (For documentation):

`bash

`Generate documentation from search results`


docsearch-mcp search "configuration options" --output yaml > config-docs.yml


$3
#### Debug Search Quality

`bash

`Compare search modes`


docsearch-mcp search "async function" --mode keyword --top-k 5
docsearch-mcp search "async function" --mode vector --top-k 5
docsearch-mcp search "async function" --mode auto --top-k 5
Check what's indexed

docsearch-mcp ingest-status


#### Performance Optimization

`bash

`Use smaller embedding model for speed`


docsearch-mcp ingest all \
  --openai-embed-model "text-embedding-3-small" \
  --openai-embed-dim 1536
Use local TEI for privacy/speed

docker run -d --name tei -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 \
  --model-id sentence-transformers/all-MiniLM-L6-v2

docsearch-mcp ingest all \ --embeddings-provider tei \ --tei-endpoint http://localhost:8080/embeddings`

`$3`

#### CI/CD Pipeline Integration

`yaml

`.github/workflows/docs-index.yml`


name: Update Documentation Index
on:
  push:
    paths: ['docs/', 'src//*.md', 'README.md']

#### Script Integration

`bash #!/bin/bash

`update-docs.sh - Automated documentation indexing`

`Check if API key is set`


if [ -z "$OPENAI_API_KEY" ]; then
    echo "Please set OPENAI_API_KEY environment variable"
    exit 1
fi
Index all project documentation

echo "Indexing project documentation..."
docsearch-mcp ingest files \
    --file-roots "./src,./docs,./examples" \
    --file-include-globs "*/.{ts,js,md,yaml,json}" \
    --db-path "./project-index.db"
Generate search summary

echo "Search index updated. Statistics:"
docsearch-mcp ingest-status --output json | jq '.documents'


$3
- Auto (default): Hybrid keyword + semantic search
- Keyword: Full-text search only
- Vector: Semantic search only
$3

Run make help to see all available commands:

`bash

`Development Commands`


make install                    # Install dependencies
make dev                        # Start development server for MCP
make dev-ingest                 # Start development ingestion
make setup                      # Setup project for development
Build Commands

make build                      # Build the project
make clean                      # Clean all generated files
Quality Assurance

make lint                       # Run linter
make lint-fix                   # Run linter with auto-fix
make format                     # Format code with Prettier
make typecheck                  # Run TypeScript type checking
make test                       # Run tests
make test-run                   # Run tests once
make test-unit                  # Run unit tests only
make test-integration           # Run integration tests (requires Docker)
make check-all                  # Run all quality checks
Production Commands

make start                      # Start production MCP server
make start-ingest               # Start production ingestion
Data Management

make ingest-files               # Ingest local files
make ingest-confluence          # Ingest Confluence pages
make watch                      # Watch for file changes and re-index
make clean-data                 # Clean data directory

$3

`bash

`CLI Tool`


pnpm dev:cli                    # CLI development mode
pnpm start:cli                  # CLI production mode (after build)
Ingestion (legacy scripts)

pnpm dev:ingest files           # Index local files
pnpm dev:ingest confluence      # Index Confluence pages
pnpm dev:ingest watch           # Watch for file changes
MCP Server

pnpm dev:mcp                    # Development server
pnpm start:mcp                  # Production server
Quality

pnpm lint                       # Run linter
pnpm test                       # Run tests
pnpm test:unit                  # Run unit tests only
pnpm test:integration           # Run integration tests (requires Docker)
pnpm build                      # Build project


🏗️ Architecture
The system follows a clean ports and adapters (hexagonal) architecture:

`$3`

`🤝 Contributing`

Contributions are welcome! Please feel free to submit a Pull Request.

`$3`

1. Fork the project 2. Create your feature branch (git checkout -b feature/AmazingFeature) 3. Set up the development environment:

`bash make setup`

4. Make your changes and ensure quality:

`bash make check-all # Run linter, typecheck, and tests make format # Format your code`

5. Commit your changes (git commit -m 'Add some AmazingFeature') 6. Push to the branch (git push origin feature/AmazingFeature`)
7. Open a Pull Request

📝 License

This project is licensed under the Apache License Version 2 - see the LICENSE file for details.

🙏 Acknowledgments

- Model Context Protocol for the integration standard
- sqlite-vec for vector search capabilities
- Claude Code for the AI-powered development experience