Local-first multi-repo workspace indexer (semantic embeddings + git-aware incremental updates + hybrid retrieval profiles) for AI agents.
npm install @neuralsea/workspace-indexerA local-first, multi-repo workspace indexer for AI agents (e.g. your custom agent “Damocles”).
This package provides high-fidelity indexing, retrieval, and context expansion across entire workspaces, while remaining safe to run locally (including VS Code extension hosts).
---
- Catalogue / Indexing DB: SQLite via sql.js (WASM)
Runs everywhere (Node, VS Code extension host, webview environments). No native binaries required.
- Vector backend: bruteforce (default)
Zero‑config, in‑memory exact search.
- Graph backend: disabled by default
> For enterprise‑scale persistence and performance, configure a remote vector backend such as Qdrant, and optionally a graph backend such as Neo4j.
---
- Whole‑workspace indexing
Multiple Git repositories under a single workspace root.
- Meaningful chunking
TypeScript/JavaScript AST‑aware chunking with robust fallbacks for other languages.
- Semantic embeddings
Pluggable providers:
- Ollama (local)
- OpenAI
- Deterministic offline hash embeddings
- Hybrid retrieval
Vector similarity + lexical search (SQLite FTS5) with configurable weights.
- Pluggable vector backends
bruteforce, hnswlib, qdrant, faiss, or a custom provider.
- Enterprise‑safe invalidation
Repo indices are keyed by:
(repo_id, head_commit, embedder_id, index_fingerprint)
Any change forces a clean rebuild to avoid stale context.
- Incremental updates
File watching + .git/HEAD detection.
- Security controls
Git‑native ignore rules, additional ignore files, and redaction hooks.
This allows the same index to support multiple agent domains:
- Search
- Refactor
- Review
- Architecture understanding
- RCA (root cause analysis)
…by selecting different retrieval profiles.
---
Workspace‑Indexer separates index infrastructure from agent logic.
Index backends define where and how indexed knowledge is stored and queried:
- Catalogue DB (files, chunks, metadata, FTS)
- Vector backend (similarity search)
- Graph backend (optional dependency / symbol / architecture graph)
Backends are configured via profiles, allowing:
- Local or remote providers
- Safe backend switching (automatic rebuilds)
- Environment‑specific defaults
---
``bash`
npm i @neuralsea/workspace-indexer
Node 18+ required.
Docs: docs/README.md
---
This package publishes a browser‑safe entrypoint:
`ts`
import { chunkSource, OpenAIEmbeddingsProvider } from "@neuralsea/workspace-indexer/browser";
The full indexer (WorkspaceIndexer, file watching, git scanning, persistence) is Node‑only and should run in the VS Code extension host, communicating with webviews via postMessage.
---
`ts
import {
WorkspaceIndexer,
OllamaEmbeddingsProvider,
IndexerProgressObservable
} from "@neuralsea/workspace-indexer";
const embedder = new OllamaEmbeddingsProvider({ model: "nomic-embed-text" });
const progress = new IndexerProgressObservable();
progress.subscribe(e => console.log(e.type, e));
const ix = new WorkspaceIndexer("/path/to/workspace", embedder, { progress });
await ix.indexAll();
const search = await ix.retrieve("Where is authentication enforced?", {
profile: "search"
});
console.log(search.hits.map(h => h.chunk.path));
await ix.closeAsync();
`
---
The same index can be queried differently depending on the task.
Built‑in profiles:
- search — tight top‑k, precise matches
- refactor — wider k, follows imports and adjacency
- review — biases to changed files, includes file synopsis
- architecture — aggressive expansion across imports
- rca — review + recency bias
Profiles control:
- k (primary hits)
- weights (vector / lexical / recency)
- expansion rules
- candidate pool sizes
Profiles can be overridden at runtime.
---
Index backends are configured using named profiles.
`json`
{
"indexBackends": {
"vectorProfiles": {
"local-default": {
"kind": "local",
"provider": "bruteforce",
"metric": "cosine"
},
"qdrant-dev": {
"kind": "qdrant",
"url": "http://localhost:6333",
"collectionPrefix": "petri"
}
},
"graphProfiles": {
"none": { "kind": "none" },
"neo4j-local": {
"kind": "neo4j",
"uri": "neo4j://localhost:7687",
"user": "neo4j",
"passwordRef": "NEO4J_PASSWORD",
"database": "neo4j",
"labelPrefix": "Petri"
}
},
"defaults": {
"vectorProfile": "local-default",
"graphProfile": "none"
}
}
}
The selected profiles are resolved internally into runtime configuration.
Earlier versions accepted Neo4j configuration under workspace.graph.
This version automatically migrates those settings into a graph profile on first run. After migration, legacy settings are ignored.
---
Disabling the graph backend does not disable index persistence.
Persistence of catalogue data, embeddings, and vector indices is controlled independently via storage settings.
---
- Git‑native ignore (git ls-files).petriignore
- Additional / .augmentignore
- Redaction hooks before embedding and storage
For higher assurance:
- set storage.ftsMode = "tokens"`
- review redaction patterns
---
MIT