@neuralsea/workspace-indexer

A local-first, multi-repo workspace indexer for AI agents (e.g. your custom agent “Damocles”).

This package provides high-fidelity indexing, retrieval, and context expansion across entire workspaces, while remaining safe to run locally (including VS Code extension hosts).

---

Default index backends

- Catalogue / Indexing DB: SQLite via sql.js (WASM)
Runs everywhere (Node, VS Code extension host, webview environments). No native binaries required.
- Vector backend: bruteforce (default)
Zero‑config, in‑memory exact search.
- Graph backend: disabled by default

> For enterprise‑scale persistence and performance, configure a remote vector backend such as Qdrant, and optionally a graph backend such as Neo4j.

---

What this package provides

- Whole‑workspace indexing
Multiple Git repositories under a single workspace root.
- Meaningful chunking
TypeScript/JavaScript AST‑aware chunking with robust fallbacks for other languages.
- Semantic embeddings
Pluggable providers:
- Ollama (local)
- OpenAI
- Deterministic offline hash embeddings
- Hybrid retrieval
Vector similarity + lexical search (SQLite FTS5) with configurable weights.
- Pluggable vector backends
bruteforce, hnswlib, qdrant, faiss, or a custom provider.
- Enterprise‑safe invalidation
Repo indices are keyed by:
(repo_id, head_commit, embedder_id, index_fingerprint)
Any change forces a clean rebuild to avoid stale context.
- Incremental updates
File watching + .git/HEAD detection.
- Security controls
Git‑native ignore rules, additional ignore files, and redaction hooks.

This allows the same index to support multiple agent domains:

- Search
- Refactor
- Review
- Architecture understanding
- RCA (root cause analysis)

…by selecting different retrieval profiles.

---

Index backends (vector & graph)

Workspace‑Indexer separates index infrastructure from agent logic.

Index backends define where and how indexed knowledge is stored and queried:

- Catalogue DB (files, chunks, metadata, FTS)
- Vector backend (similarity search)
- Graph backend (optional dependency / symbol / architecture graph)

Backends are configured via profiles, allowing:

- Local or remote providers
- Safe backend switching (automatic rebuilds)
- Environment‑specific defaults

---

Install

``bash npm i @neuralsea/workspace-indexer`

Node 18+ required.

Docs: docs/README.md

---

`Browser / VS Code webview`

This package publishes a browser‑safe entrypoint:

`ts import { chunkSource, OpenAIEmbeddingsProvider } from "@neuralsea/workspace-indexer/browser";`

The full indexer (WorkspaceIndexer, file watching, git scanning, persistence) is Node‑only and should run in the VS Code extension host, communicating with webviews via postMessage.

---

`Quick start (library)`

`ts import { WorkspaceIndexer, OllamaEmbeddingsProvider, IndexerProgressObservable } from "@neuralsea/workspace-indexer";

const embedder = new OllamaEmbeddingsProvider({ model: "nomic-embed-text" });

const progress = new IndexerProgressObservable(); progress.subscribe(e => console.log(e.type, e));

const ix = new WorkspaceIndexer("/path/to/workspace", embedder, { progress });

await ix.indexAll();

const search = await ix.retrieve("Where is authentication enforced?", { profile: "search" });

console.log(search.hits.map(h => h.chunk.path));

await ix.closeAsync();`

---

`Retrieval profiles`

The same index can be queried differently depending on the task.

Built‑in profiles:

- search — tight top‑k, precise matches - refactor — wider k, follows imports and adjacency - review — biases to changed files, includes file synopsis - architecture — aggressive expansion across imports - rca — review + recency bias

Profiles control:

- k (primary hits) - weights (vector / lexical / recency) - expansion rules - candidate pool sizes

Profiles can be overridden at runtime.

---

`Index backend configuration (profiles)`

Index backends are configured using named profiles.

`json { "indexBackends": { "vectorProfiles": { "local-default": { "kind": "local", "provider": "bruteforce", "metric": "cosine" }, "qdrant-dev": { "kind": "qdrant", "url": "http://localhost:6333", "collectionPrefix": "petri" } }, "graphProfiles": { "none": { "kind": "none" }, "neo4j-local": { "kind": "neo4j", "uri": "neo4j://localhost:7687", "user": "neo4j", "passwordRef": "NEO4J_PASSWORD", "database": "neo4j", "labelPrefix": "Petri" } }, "defaults": { "vectorProfile": "local-default", "graphProfile": "none" } } }`

The selected profiles are resolved internally into runtime configuration.

`$3`

Earlier versions accepted Neo4j configuration under workspace.graph.

This version automatically migrates those settings into a graph profile on first run. After migration, legacy settings are ignored.

---

`Persistence semantics`

Disabling the graph backend does not disable index persistence.

Persistence of catalogue data, embeddings, and vector indices is controlled independently via storage settings.

---

`Security model`

- Git‑native ignore (git ls-files) - Additional.petriignore / .augmentignore- Redaction hooks before embedding and storage

For higher assurance: - setstorage.ftsMode = "tokens"`
- review redaction patterns

---

Licence

MIT

@neuralsea/workspace-indexer

A local-first, multi-repo workspace indexer for AI agents (e.g. your custom agent “Damocles”).

This package provides high-fidelity indexing, retrieval, and context expansion across entire workspaces, while remaining safe to run locally (including VS Code extension hosts).

---

Default index backends

> For enterprise‑scale persistence and performance, configure a remote vector backend such as Qdrant, and optionally a graph backend such as Neo4j.

---

What this package provides

This allows the same index to support multiple agent domains:

- Search
- Refactor
- Review
- Architecture understanding
- RCA (root cause analysis)

…by selecting different retrieval profiles.

---

Index backends (vector & graph)

Workspace‑Indexer separates index infrastructure from agent logic.

Index backends define where and how indexed knowledge is stored and queried:

- Catalogue DB (files, chunks, metadata, FTS)
- Vector backend (similarity search)
- Graph backend (optional dependency / symbol / architecture graph)

Backends are configured via profiles, allowing:

- Local or remote providers
- Safe backend switching (automatic rebuilds)
- Environment‑specific defaults

---

Install

``bash npm i @neuralsea/workspace-indexer`

Node 18+ required.

Docs: docs/README.md

---

`Browser / VS Code webview`

This package publishes a browser‑safe entrypoint:

`ts import { chunkSource, OpenAIEmbeddingsProvider } from "@neuralsea/workspace-indexer/browser";`

The full indexer (WorkspaceIndexer, file watching, git scanning, persistence) is Node‑only and should run in the VS Code extension host, communicating with webviews via postMessage.

---

`Quick start (library)`

`ts import { WorkspaceIndexer, OllamaEmbeddingsProvider, IndexerProgressObservable } from "@neuralsea/workspace-indexer";

const embedder = new OllamaEmbeddingsProvider({ model: "nomic-embed-text" });

const progress = new IndexerProgressObservable(); progress.subscribe(e => console.log(e.type, e));

const ix = new WorkspaceIndexer("/path/to/workspace", embedder, { progress });

await ix.indexAll();

const search = await ix.retrieve("Where is authentication enforced?", { profile: "search" });

console.log(search.hits.map(h => h.chunk.path));

await ix.closeAsync();`

---

`Retrieval profiles`

The same index can be queried differently depending on the task.

Built‑in profiles:

Profiles control:

- k (primary hits) - weights (vector / lexical / recency) - expansion rules - candidate pool sizes

Profiles can be overridden at runtime.

---

`Index backend configuration (profiles)`

Index backends are configured using named profiles.

The selected profiles are resolved internally into runtime configuration.

`$3`

Earlier versions accepted Neo4j configuration under workspace.graph.

This version automatically migrates those settings into a graph profile on first run. After migration, legacy settings are ignored.

---

`Persistence semantics`

Disabling the graph backend does not disable index persistence.

Persistence of catalogue data, embeddings, and vector indices is controlled independently via storage settings.

---

`Security model`

- Git‑native ignore (git ls-files) - Additional.petriignore / .augmentignore- Redaction hooks before embedding and storage

For higher assurance: - setstorage.ftsMode = "tokens"`
- review redaction patterns

---

Licence

MIT