RAG Vault

![License: MIT](https://opensource.org/licenses/MIT)
![TypeScript](https://www.typescriptlang.org/)
![MCP Registry](https://registry.modelcontextprotocol.io/servers/io.github.RobThePCGuy/rag-vault)

Your documents. Your machine. Your control.

RAG Vault gives AI coding assistants instant access to your private documents—API specs, research papers, internal docs—without ever sending data to the cloud. One command, zero configuration, complete privacy.

Why RAG Vault?

| Pain Point | RAG Vault Solution |
|------------|-------------------|
| "I don't want my docs on someone else's server" | Everything stays local. No API calls after setup. |
| "Semantic search misses exact code terms" | Hybrid search: meaning + exact matches like useEffect |
| "Setup requires Docker, Python, databases..." | One npx command. Done. |
| "Cloud APIs charge per query" | Free forever. No subscriptions. |

Security

RAG Vault includes security features for production deployment:
- API Authentication — Optional API key via RAG_API_KEY
- Rate Limiting — Configurable request throttling
- CORS Control — Restrict allowed origins
- Security Headers — Helmet.js protection

See SECURITY.md for complete documentation.

Get Started in 30 Seconds

$3

Add to ~/.cursor/mcp.json:

``

json

{

  "mcpServers": {

    "local-rag": {

      "type": "stdio",

      "command": "npx",

      "args": ["-y", "github:RobThePCGuy/rag-vault"],

      "env": {

        "BASE_DIR": "/path/to/your/documents"

      }

    }

  }

}





$3



Add to

.mcp.json

 in your project directory:

json

{

  "mcpServers": {

    "local-rag": {

      "type": "stdio",

      "command": "npx",

      "args": ["-y", "github:RobThePCGuy/rag-vault"],

      "env": {

        "BASE_DIR": "./documents",

        "DB_PATH": "./documents/.rag-db",

        "CACHE_DIR": "./.cache",

        "RAG_HYBRID_WEIGHT": "0.6",

        "RAG_GROUPING": "related"

      }

    }

  }

}





Or add inline via CLI:

bash

claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y github:RobThePCGuy/rag-vault





$3



Add to

~/.codex/config.toml

toml

[mcp_servers.local-rag]

command = "npx"

args = ["-y", "github:RobThePCGuy/rag-vault"]



[mcp_servers.local-rag.env]

BASE_DIR = "/path/to/your/documents"





$3



For enhanced AI guidance on query formulation and result interpretation, install the RAG Vault skills:

bash

Claude Code (project-level - recommended for team projects)

npx github:RobThePCGuy/rag-vault skills install --claude-code



Claude Code (user-level - available in all projects)

npx github:RobThePCGuy/rag-vault skills install --claude-code --global



Codex (user-level)

npx github:RobThePCGuy/rag-vault skills install --codex



Custom location

npx github:RobThePCGuy/rag-vault skills install --path /your/custom/path





Skills teach Claude best practices for:

- Query formulation and expansion strategies

- Score interpretation (< 0.3 = good match, > 0.5 = skip)

- When to use

ingest_file vs ingest_data



- HTML ingestion and URL handling



Restart your AI tool, and start talking:



You: "Ingest api-spec.pdf"

AI:  Successfully ingested api-spec.pdf (47 chunks)



You: "How does authentication work?"

AI:  Based on section 3.2, authentication uses OAuth 2.0 with JWT tokens...





That's it. No Docker. No Python. No servers.



Web Interface



RAG Vault includes a full-featured web UI for managing your documents without the command line.



$3

bash

npx github:RobThePCGuy/rag-vault web





Open http://localhost:3000 in your browser.



$3



- Upload documents — Drag and drop PDFs, Word docs, Markdown, text files

- Search instantly — Type queries and see results with relevance scores

- Preview content — Click any result to see the full chunk in context

- Manage files — View all indexed documents, delete what you don't need

- Switch databases — Create and switch between multiple knowledge bases

- Monitor status — See document counts, memory usage, and search mode

- Export/Import settings — Back up and restore your vault configuration

- Theme preferences — Switch between light, dark, or system theme

- Folder browser — Navigate directories to select documents



$3



The web server exposes a REST API for programmatic access. Set

RAG_API_KEY

 to require authentication:

bash

With authentication (when RAG_API_KEY is set)

curl -X POST "http://localhost:3000/api/v1/search" \

  -H "Authorization: Bearer your-api-key" \

  -H "Content-Type: application/json" \

  -d '{"query": "authentication", "limit": 5}'



Search documents (no auth required if RAG_API_KEY is not set)

curl -X POST "http://localhost:3000/api/v1/search" \

  -H "Content-Type: application/json" \

  -d '{"query": "authentication", "limit": 5}'



List all files

curl "http://localhost:3000/api/v1/files"



Upload a document

curl -X POST "http://localhost:3000/api/v1/files/upload" \

  -F "file=@spec.pdf"



Delete a file

curl -X DELETE "http://localhost:3000/api/v1/files" \

  -H "Content-Type: application/json" \

  -d '{"filePath": "/path/to/spec.pdf"}'



Get system status

curl "http://localhost:3000/api/v1/status"



Health check (for load balancers)

curl "http://localhost:3000/api/v1/health"





$3



For programmatic document reading and cross-document discovery:

bash

Get all chunks for a document (ordered by index)

curl "http://localhost:3000/api/v1/documents/chunks?filePath=/path/to/doc.pdf"



Find related chunks for cross-document discovery

curl "http://localhost:3000/api/v1/chunks/related?filePath=/path/to/doc.pdf&chunkIndex=0&limit=5"



Batch request for multiple chunks (efficient for UIs)

curl -X POST "http://localhost:3000/api/v1/chunks/batch-related" \

  -H "Content-Type: application/json" \

  -d '{"chunks": [{"filePath": "/path/to/doc.pdf", "chunkIndex": 0}], "limit": 3}'





Real-World Examples



$3



You: "Ingest all the markdown files in /docs"

AI:  Ingested 23 files (847 chunks total)



You: "What's the retry policy for failed API calls?"

AI:  According to error-handling.md, failed requests retry 3 times

     with exponential backoff: 1s, 2s, 4s...

$3



You: "Fetch https://docs.example.com/api and ingest the HTML"

AI:  Ingested "docs.example.com/api" (156 chunks)



You: "What rate limits apply to the /users endpoint?"

AI:  The API limits /users to 100 requests per minute per API key...

$3



You: "Ingest my research papers folder"

AI:  Ingested 12 PDFs (2,341 chunks)



You: "What do recent studies say about transformer attention mechanisms?"

AI:  Based on attention-mechanisms-2024.pdf, the key finding is...





$3



RAG Vault's hybrid search catches both meaning and exact matches:



You: "Search for ERR_CONNECTION_REFUSED"

AI:  Found 3 results mentioning ERR_CONNECTION_REFUSED:

     1. troubleshooting.md - "When you see ERR_CONNECTION_REFUSED..."

     2. network-errors.pdf - "Common causes include..."





Pure semantic search would miss this. RAG Vault finds it.



How It Works



Document → Parse → Chunk by meaning → Embed locally → Store in LanceDB

                         ↓

Query → Embed → Vector search → Keyword boost → Quality filter → Results





Smart chunking: Splits by meaning, not character count. Keeps code blocks intact.



Hybrid search: Vector similarity finds related content. Keyword boost ranks exact matches higher.



Quality filtering: Groups results by relevance gaps instead of arbitrary top-K cutoffs.



Local everything: Embeddings via Transformers.js. Storage via LanceDB. No network after model download.



Supported Formats



| Format | Extension | Notes |

|--------|-----------|-------|

| PDF |

.pdf

 | Full text extraction, header/footer filtering |

| Word |

.docx

 | Tables, lists, formatting preserved |

| Markdown |

.md

 | Code blocks kept intact |

| Text |

.txt

 | Plain text |

| JSON |

.json

 | Converted to searchable key-value text |

| HTML | via

ingest_data

 | Auto-cleaned with Readability |



Configuration



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

BASE_DIR

 | Current directory | Only files under this path can be accessed |

|

DB_PATH | ./lancedb/

 | Where vectors are stored |

|

MODEL_NAME | Xenova/all-MiniLM-L6-v2

 | HuggingFace embedding model |

|

WEB_PORT | 3000

 | Port for web interface |



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

RAG_HYBRID_WEIGHT | 0.6

 | Keyword boost strength. 0 = semantic-only, higher = stronger boost for exact keyword matches |

|

RAG_GROUPING | — | similar = top group only, related

 = top 2 groups |

|

RAG_MAX_DISTANCE

 | — | Filter out results below this relevance threshold |



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

RAG_API_KEY

 | — | API key for authentication |

|

CORS_ORIGINS | localhost | Allowed origins (comma-separated, or *

) |

|

RATE_LIMIT_WINDOW_MS | 60000

 | Rate limit time window (ms) |

|

RATE_LIMIT_MAX_REQUESTS | 100

 | Max requests per window |



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

ALLOWED_SCAN_ROOTS

 | Home directory | Directories allowed for database scanning |

|

JSON_BODY_LIMIT | 5mb

 | Max request body size |

|

REQUEST_TIMEOUT_MS | 30000

 | API request timeout |

|

REQUEST_LOGGING | false

 | Enable request audit logging |



> Copy

.env.example

 for a complete configuration template.



For code-heavy content, try:

json

"env": {

  "RAG_HYBRID_WEIGHT": "0.8",

  "RAG_GROUPING": "similar"

}





Frequently Asked Questions





Is my data really private?



Yes. After the embedding model downloads (~90MB), RAG Vault makes zero network requests. Everything runs on your machine. Verify with network monitoring.









Does it work offline?



Yes, after the first run. The model caches locally.









What about GPU acceleration?



Transformers.js runs on CPU. GPU support is experimental but unnecessary for most use cases—queries return in ~1 second even with 10,000 chunks.



Can I change the embedding model?



Yes. Set

MODEL_NAME to any compatible HuggingFace model. But you must delete DB_PATH

 and re-ingest—different models produce incompatible vectors.



Recommended upgrade: For better quality and multilingual support, use EmbeddingGemma:

json

"MODEL_NAME": "onnx-community/embeddinggemma-300m-ONNX"





This 300M parameter model scores 68.36 on MTEB benchmarks and supports 100+ languages, making it ideal for mixed-language or high-quality retrieval needs.



Other specialized models:

- Scientific:

sentence-transformers/allenai-specter



- Code:

jinaai/jina-embeddings-v2-base-code



How do I back up my data?



Copy the

DB_PATH directory (default: ./lancedb/

).





Troubleshooting



| Problem | Solution |

|---------|----------|

| No results found | Documents must be ingested first. Run "List all ingested files" to check. |

| Model download failed | Check internet connection. Model is ~90MB from HuggingFace. |

| File too large | Default limit is 100MB. Set

MAX_FILE_SIZE

 higher or split the file. |

| Path outside BASE_DIR | All file paths must be under

BASE_DIR

. Use absolute paths. |

| MCP tools not showing | Verify config syntax, restart your AI tool completely (Cmd+Q on Mac). |

| 401 Unauthorized | API key required. Set

RAG_API_KEY

 or use correct header format. |

| 429 Too Many Requests | Rate limited. Wait for reset or increase

RATE_LIMIT_MAX_REQUESTS

. |

| CORS errors | Add your origin to

CORS_ORIGINS

 environment variable. |



Development

bash

git clone https://github.com/RobThePCGuy/rag-vault.git

cd rag-vault

pnpm install



Run tests

pnpm test



Type check + lint + format

pnpm check:all



Build

pnpm build



Run MCP server locally

pnpm dev



Run web server locally

pnpm web:dev

$3



src/

├── server/      # MCP tool handlers

├── vectordb/    # LanceDB + hybrid search

├── chunker/     # Semantic text splitting

├── embedder/    # Transformers.js wrapper

├── parser/      # PDF, DOCX, HTML parsing

├── web/         # Express server + REST API

└── __tests__/   # Test suites



web-ui/          # React frontend

Documentation

- SECURITY.md — Security configuration and best practices
- .env.example — Complete environment variable template

License

MIT — free for personal and commercial use.

Acknowledgments

Built with Model Context Protocol, LanceDB, and Transformers.js.

> Started as a fork of mcp-local-rag by Shinsuke Kagawa. Now it’s its own thing.
> Huge credit to upstream contributors for the foundation, I’ve been iterating hard from there.
> Local-first dev tools, all the way.

RAG Vault

Why RAG Vault?

Security

Get Started in 30 Seconds

$3

Add to ~/.cursor/mcp.json:

``

json

{

  "mcpServers": {

    "local-rag": {

      "type": "stdio",

      "command": "npx",

      "args": ["-y", "github:RobThePCGuy/rag-vault"],

      "env": {

        "BASE_DIR": "/path/to/your/documents"

      }

    }

  }

}





$3



Add to

.mcp.json

 in your project directory:

json

{

  "mcpServers": {

    "local-rag": {

      "type": "stdio",

      "command": "npx",

      "args": ["-y", "github:RobThePCGuy/rag-vault"],

      "env": {

        "BASE_DIR": "./documents",

        "DB_PATH": "./documents/.rag-db",

        "CACHE_DIR": "./.cache",

        "RAG_HYBRID_WEIGHT": "0.6",

        "RAG_GROUPING": "related"

      }

    }

  }

}





Or add inline via CLI:

bash

claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y github:RobThePCGuy/rag-vault





$3



Add to

~/.codex/config.toml

toml

[mcp_servers.local-rag]

command = "npx"

args = ["-y", "github:RobThePCGuy/rag-vault"]



[mcp_servers.local-rag.env]

BASE_DIR = "/path/to/your/documents"





$3



For enhanced AI guidance on query formulation and result interpretation, install the RAG Vault skills:

bash

Claude Code (project-level - recommended for team projects)

npx github:RobThePCGuy/rag-vault skills install --claude-code



Claude Code (user-level - available in all projects)

npx github:RobThePCGuy/rag-vault skills install --claude-code --global



Codex (user-level)

npx github:RobThePCGuy/rag-vault skills install --codex



Custom location

npx github:RobThePCGuy/rag-vault skills install --path /your/custom/path





Skills teach Claude best practices for:

- Query formulation and expansion strategies

- Score interpretation (< 0.3 = good match, > 0.5 = skip)

- When to use

ingest_file vs ingest_data



- HTML ingestion and URL handling



Restart your AI tool, and start talking:



You: "Ingest api-spec.pdf"

AI:  Successfully ingested api-spec.pdf (47 chunks)



You: "How does authentication work?"

AI:  Based on section 3.2, authentication uses OAuth 2.0 with JWT tokens...





That's it. No Docker. No Python. No servers.



Web Interface



RAG Vault includes a full-featured web UI for managing your documents without the command line.



$3

bash

npx github:RobThePCGuy/rag-vault web





Open http://localhost:3000 in your browser.



$3



- Upload documents — Drag and drop PDFs, Word docs, Markdown, text files

- Search instantly — Type queries and see results with relevance scores

- Preview content — Click any result to see the full chunk in context

- Manage files — View all indexed documents, delete what you don't need

- Switch databases — Create and switch between multiple knowledge bases

- Monitor status — See document counts, memory usage, and search mode

- Export/Import settings — Back up and restore your vault configuration

- Theme preferences — Switch between light, dark, or system theme

- Folder browser — Navigate directories to select documents



$3



The web server exposes a REST API for programmatic access. Set

RAG_API_KEY

 to require authentication:

bash

With authentication (when RAG_API_KEY is set)

curl -X POST "http://localhost:3000/api/v1/search" \

  -H "Authorization: Bearer your-api-key" \

  -H "Content-Type: application/json" \

  -d '{"query": "authentication", "limit": 5}'



Search documents (no auth required if RAG_API_KEY is not set)

curl -X POST "http://localhost:3000/api/v1/search" \

  -H "Content-Type: application/json" \

  -d '{"query": "authentication", "limit": 5}'



List all files

curl "http://localhost:3000/api/v1/files"



Upload a document

curl -X POST "http://localhost:3000/api/v1/files/upload" \

  -F "file=@spec.pdf"



Delete a file

curl -X DELETE "http://localhost:3000/api/v1/files" \

  -H "Content-Type: application/json" \

  -d '{"filePath": "/path/to/spec.pdf"}'



Get system status

curl "http://localhost:3000/api/v1/status"



Health check (for load balancers)

curl "http://localhost:3000/api/v1/health"





$3



For programmatic document reading and cross-document discovery:

bash

Get all chunks for a document (ordered by index)

curl "http://localhost:3000/api/v1/documents/chunks?filePath=/path/to/doc.pdf"



Find related chunks for cross-document discovery

curl "http://localhost:3000/api/v1/chunks/related?filePath=/path/to/doc.pdf&chunkIndex=0&limit=5"



Batch request for multiple chunks (efficient for UIs)

curl -X POST "http://localhost:3000/api/v1/chunks/batch-related" \

  -H "Content-Type: application/json" \

  -d '{"chunks": [{"filePath": "/path/to/doc.pdf", "chunkIndex": 0}], "limit": 3}'





Real-World Examples



$3



You: "Ingest all the markdown files in /docs"

AI:  Ingested 23 files (847 chunks total)



You: "What's the retry policy for failed API calls?"

AI:  According to error-handling.md, failed requests retry 3 times

     with exponential backoff: 1s, 2s, 4s...

$3



You: "Fetch https://docs.example.com/api and ingest the HTML"

AI:  Ingested "docs.example.com/api" (156 chunks)



You: "What rate limits apply to the /users endpoint?"

AI:  The API limits /users to 100 requests per minute per API key...

$3



You: "Ingest my research papers folder"

AI:  Ingested 12 PDFs (2,341 chunks)



You: "What do recent studies say about transformer attention mechanisms?"

AI:  Based on attention-mechanisms-2024.pdf, the key finding is...





$3



RAG Vault's hybrid search catches both meaning and exact matches:



You: "Search for ERR_CONNECTION_REFUSED"

AI:  Found 3 results mentioning ERR_CONNECTION_REFUSED:

     1. troubleshooting.md - "When you see ERR_CONNECTION_REFUSED..."

     2. network-errors.pdf - "Common causes include..."





Pure semantic search would miss this. RAG Vault finds it.



How It Works



Document → Parse → Chunk by meaning → Embed locally → Store in LanceDB

                         ↓

Query → Embed → Vector search → Keyword boost → Quality filter → Results





Smart chunking: Splits by meaning, not character count. Keeps code blocks intact.



Hybrid search: Vector similarity finds related content. Keyword boost ranks exact matches higher.



Quality filtering: Groups results by relevance gaps instead of arbitrary top-K cutoffs.



Local everything: Embeddings via Transformers.js. Storage via LanceDB. No network after model download.



Supported Formats



| Format | Extension | Notes |

|--------|-----------|-------|

| PDF |

.pdf

 | Full text extraction, header/footer filtering |

| Word |

.docx

 | Tables, lists, formatting preserved |

| Markdown |

.md

 | Code blocks kept intact |

| Text |

.txt

 | Plain text |

| JSON |

.json

 | Converted to searchable key-value text |

| HTML | via

ingest_data

 | Auto-cleaned with Readability |



Configuration



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

BASE_DIR

 | Current directory | Only files under this path can be accessed |

|

DB_PATH | ./lancedb/

 | Where vectors are stored |

|

MODEL_NAME | Xenova/all-MiniLM-L6-v2

 | HuggingFace embedding model |

|

WEB_PORT | 3000

 | Port for web interface |



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

RAG_HYBRID_WEIGHT | 0.6

 | Keyword boost strength. 0 = semantic-only, higher = stronger boost for exact keyword matches |

|

RAG_GROUPING | — | similar = top group only, related

 = top 2 groups |

|

RAG_MAX_DISTANCE

 | — | Filter out results below this relevance threshold |



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

RAG_API_KEY

 | — | API key for authentication |

|

CORS_ORIGINS | localhost | Allowed origins (comma-separated, or *

) |

|

RATE_LIMIT_WINDOW_MS | 60000

 | Rate limit time window (ms) |

|

RATE_LIMIT_MAX_REQUESTS | 100

 | Max requests per window |



$3



| Variable | Default | What it does |

|----------|---------|--------------|

|

ALLOWED_SCAN_ROOTS

 | Home directory | Directories allowed for database scanning |

|

JSON_BODY_LIMIT | 5mb

 | Max request body size |

|

REQUEST_TIMEOUT_MS | 30000

 | API request timeout |

|

REQUEST_LOGGING | false

 | Enable request audit logging |



> Copy

.env.example

 for a complete configuration template.



For code-heavy content, try:

json

"env": {

  "RAG_HYBRID_WEIGHT": "0.8",

  "RAG_GROUPING": "similar"

}





Frequently Asked Questions





Is my data really private?



Yes. After the embedding model downloads (~90MB), RAG Vault makes zero network requests. Everything runs on your machine. Verify with network monitoring.









Does it work offline?



Yes, after the first run. The model caches locally.









What about GPU acceleration?



Transformers.js runs on CPU. GPU support is experimental but unnecessary for most use cases—queries return in ~1 second even with 10,000 chunks.



Can I change the embedding model?



Yes. Set

MODEL_NAME to any compatible HuggingFace model. But you must delete DB_PATH

 and re-ingest—different models produce incompatible vectors.



Recommended upgrade: For better quality and multilingual support, use EmbeddingGemma:

json

"MODEL_NAME": "onnx-community/embeddinggemma-300m-ONNX"





This 300M parameter model scores 68.36 on MTEB benchmarks and supports 100+ languages, making it ideal for mixed-language or high-quality retrieval needs.



Other specialized models:

- Scientific:

sentence-transformers/allenai-specter



- Code:

jinaai/jina-embeddings-v2-base-code



How do I back up my data?



Copy the

DB_PATH directory (default: ./lancedb/

).





Troubleshooting



| Problem | Solution |

|---------|----------|

| No results found | Documents must be ingested first. Run "List all ingested files" to check. |

| Model download failed | Check internet connection. Model is ~90MB from HuggingFace. |

| File too large | Default limit is 100MB. Set

MAX_FILE_SIZE

 higher or split the file. |

| Path outside BASE_DIR | All file paths must be under

BASE_DIR

. Use absolute paths. |

| MCP tools not showing | Verify config syntax, restart your AI tool completely (Cmd+Q on Mac). |

| 401 Unauthorized | API key required. Set

RAG_API_KEY

 or use correct header format. |

| 429 Too Many Requests | Rate limited. Wait for reset or increase

RATE_LIMIT_MAX_REQUESTS

. |

| CORS errors | Add your origin to

CORS_ORIGINS

 environment variable. |



Development

bash

git clone https://github.com/RobThePCGuy/rag-vault.git

cd rag-vault

pnpm install



Run tests

pnpm test



Type check + lint + format

pnpm check:all



Build

pnpm build



Run MCP server locally

pnpm dev



Run web server locally

pnpm web:dev

$3



src/

├── server/      # MCP tool handlers

├── vectordb/    # LanceDB + hybrid search

├── chunker/     # Semantic text splitting

├── embedder/    # Transformers.js wrapper

├── parser/      # PDF, DOCX, HTML parsing

├── web/         # Express server + REST API

└── __tests__/   # Test suites



web-ui/          # React frontend

Documentation

- SECURITY.md — Security configuration and best practices
- .env.example — Complete environment variable template

License

MIT — free for personal and commercial use.

@robthepcguy/rag-vault

RAG Vault

Why RAG Vault?

Security

Get Started in 30 Seconds

$3

$3

$3

$3

Claude Code (project-level - recommended for team projects)

Claude Code (user-level - available in all projects)

Codex (user-level)

Custom location

Web Interface

$3

$3

$3

With authentication (when RAG_API_KEY is set)

Search documents (no auth required if RAG_API_KEY is not set)

List all files

Upload a document

Delete a file

Get system status

Health check (for load balancers)

$3

Get all chunks for a document (ordered by index)

Find related chunks for cross-document discovery

Batch request for multiple chunks (efficient for UIs)

Real-World Examples

$3

$3

$3

$3

How It Works

Supported Formats

Configuration

$3

$3

$3

$3

Frequently Asked Questions

Troubleshooting

Development

Run tests

Type check + lint + format

Build

Run MCP server locally

Run web server locally

$3

Documentation

License

Acknowledgments

@robthepcguy/rag-vault

RAG Vault

Why RAG Vault?

Security

Get Started in 30 Seconds

$3

$3

$3

$3

Claude Code (project-level - recommended for team projects)

Claude Code (user-level - available in all projects)

Codex (user-level)

Custom location

Web Interface

$3

$3

$3

With authentication (when RAG_API_KEY is set)

Search documents (no auth required if RAG_API_KEY is not set)

List all files

Upload a document

Delete a file

Get system status

Health check (for load balancers)

$3

Get all chunks for a document (ordered by index)

Find related chunks for cross-document discovery

Batch request for multiple chunks (efficient for UIs)