OpenCode plugin: semantic search with query decomposition, RRF merge, and context-efficient workspace (v4.5.0)
Semantic code search with graph-based context for OpenCode
Search code by meaning, not by text. Get related context automatically via code graph.
---
An OpenCode plugin that adds smart search to your project:
- Semantic search — finds code by meaning, even when words don't match
- Hybrid search — combines vector similarity + BM25 keyword matching
- Graph-based context — automatically attaches related code (imports, calls, type references) to search results
- Two-phase indexing — BM25 + graph search available immediately (Phase 1), vector search after embedding (Phase 2)
- Simplified API — 5 parameters, smart filter parsing, config-driven defaults
- Automatic indexing — files are indexed on change, zero effort
- Local vectorization — works offline, no API keys needed
- Three indexes — separate for code, docs, and configs
---
``bash`
npm install @comfanion/usethis_search
Add to opencode.json:
`json`
{
"plugin": ["@comfanion/usethis_search"]
}
On OpenCode startup, the plugin automatically:
1. Creates indexes for code and documentation
2. Phase 1: chunks files, builds code graph (fast, parallel) — BM25 search available immediately
3. Phase 2: embeds chunks into vectors — hybrid search available after completion
Indexing time estimates:
- < 100 files — ~1 min
- < 500 files — ~3 min
- 500+ files — ~10 min
---
The search tool has 5 parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | string | required | What you're looking for (semantic) |index
| | string | "code" | Which index: code, docs, config |limit
| | number | 10 | Number of results |searchAll
| | boolean | false | Search across all indexes |filter
| | string | — | Filter by path or language |
`javascript
// Basic semantic search
search({ query: "authentication logic" })
// Search documentation
search({ query: "how to deploy", index: "docs" })
// Search all indexes
search({ query: "database connection", searchAll: true })
// Filter by directory
search({ query: "tenant management", filter: "internal/domain/" })
// Filter by language
search({ query: "event handling", filter: "*.go" })
search({ query: "middleware", filter: "go" })
// Combined: directory + language
search({ query: "API routes", filter: "internal/*/.go" })
// Substring match on file path
search({ query: "metrics", filter: "service" })
// More results
search({ query: "error handling", limit: 20 })
`
The filter parameter is smart — it auto-detects what you mean:
| Input | Parsed as |
|-------|-----------|
| "internal/domain/" | Path prefix |"*.go"
| or ".go" | Language filter (go) |"go"
| or "python" | Language filter |"internal/*/.go"
| | Path prefix + language |"service"
| | Substring match on file path |
Each result includes:
- Score breakdown: Score: 0.619 (vec: 0.47, bm25: +0.04, kw: +0.11 | matched: "event", "correlation")
- Rich metadata: language, function name, class name, heading context
- File grouping: best chunk per file + "N matching sections" count
- Related context: graph-expanded neighbors (imports, calls, type references)
- Confidence signal: warning when top score < 0.45
When vectors are not yet available (Phase 2 in progress), search automatically falls back to BM25-only mode with a banner notification.
---
`bashReindex everything
bunx usethis_search reindex
$3
`javascript
// List all indexes with stats
codeindex({ action: "list" })// Check specific index status
codeindex({ action: "status", index: "code" })
// Reindex
codeindex({ action: "reindex", index: "code" })
`---
Architecture
$3
`
Phase 1 (fast, parallel, 5 workers):
file -> read -> chunk -> regex analyze -> graph edges -> ChunkStore (SQLite)
Result: BM25 + graph search available immediatelyPhase 2 (batch, sequential):
ChunkStore chunks -> batch embed (32/batch) -> LanceDB
Result: vector/hybrid search becomes available
`$3
`
Has vectors? -> hybrid search (vector + BM25 + graph + keyword rerank)
No vectors? -> BM25-only search (from ChunkStore + graph + keyword rerank)
`$3
`
.opencode/
vectors/
code/
lancedb/ # Vector embeddings (LanceDB)
chunks.db # Chunk content + metadata (SQLite, ChunkStore)
hashes.json # File hashes for change detection
docs/
lancedb/
chunks.db
hashes.json
graph/
code_graph.db # Code relationships (SQLite, GraphDB)
doc_graph.db # Doc relationships (SQLite, GraphDB)
vectorizer.yaml # Configuration
indexer.log # Indexing log
`$3
| Module | Purpose |
|--------|---------|
| Core | |
|
vectorizer/index.ts | CodebaseIndexer, two-phase pipeline, search, singleton pool |
| vectorizer/chunk-store.ts | SQLite chunk storage (BM25 without vectors) |
| vectorizer/graph-db.ts | SQLite triple store for code relationships |
| vectorizer/graph-builder.ts | Builds graph edges from code analysis |
| vectorizer/bm25-index.ts | Inverted index for keyword search |
| Chunking | |
| vectorizer/chunkers/code-chunker.ts | Function/class-aware splitting |
| vectorizer/chunkers/markdown-chunker.ts | Heading-aware splitting with hierarchy |
| vectorizer/chunkers/chunker-factory.ts | Routes to correct chunker by file type |
| Analysis | |
| vectorizer/analyzers/regex-analyzer.ts | Regex-based code analysis (imports, calls, types) |
| vectorizer/analyzers/lsp-analyzer.ts | LSP-based code analysis (definitions, references) |
| vectorizer/analyzers/lsp-client.ts | Language Server Protocol client |
| Search | |
| vectorizer/hybrid-search.ts | Merge vector + BM25 scores |
| vectorizer/query-cache.ts | LRU cache for query embeddings |
| vectorizer/content-cleaner.ts | Remove noise (TOC, breadcrumbs, markers) |
| vectorizer/metadata-extractor.ts | Extract file_type, language, tags, dates |
| Tracking | |
| vectorizer/search-metrics.ts | Search quality metrics |
| vectorizer/usage-tracker.ts | Usage provenance tracking |
| Tools | |
| tools/search.ts | Search tool (5 params, smart filter, score breakdown) |
| tools/codeindex.ts | Index management tool |$3
The code graph tracks relationships between chunks:
- imports — file A imports module B
- calls — function A calls function B
- references — code references a type/interface
- implements — class implements an interface
- extends — class extends another class
- belongs_to — chunk belongs to file (structural)
When you search, results are automatically expanded with 1-hop graph neighbors. Related context is scored by
edge_weight cosine_similarity (or edge_weight 0.7 in BM25-only mode) and filtered by min_relevance.$3
Multiple parallel searches share one
CodebaseIndexer instance per (project, index) pair. No SQLite lock conflicts. Managed via getIndexer() / releaseIndexer() / destroyIndexer().---
Configuration
$3
`yaml
.opencode/vectorizer.yaml
vectorizer:
enabled: true
auto_index: true
model: "Xenova/all-MiniLM-L6-v2"
debounce_ms: 1000 cleaning:
remove_toc: true
remove_frontmatter_metadata: false
remove_imports: false
remove_comments: false
chunking:
strategy: "semantic" # fixed | semantic
markdown:
split_by_headings: true
min_chunk_size: 200
max_chunk_size: 2000
preserve_heading_hierarchy: true
code:
split_by_functions: true
include_function_signature: true
min_chunk_size: 300
max_chunk_size: 1500
fixed:
max_chars: 1500
search:
hybrid: true
bm25_weight: 0.3
freshen: false # Don't re-index on every search
min_score: 0.35 # Minimum relevance cutoff
include_archived: false
default_limit: 10
graph:
enabled: true
max_related: 4 # Max related chunks per result
min_relevance: 0.5 # Min score for related context
semantic_edges: false # O(n^2) — enable only for small repos
semantic_edges_max_chunks: 500
lsp:
enabled: true
timeout_ms: 5000
read_intercept: true
quality:
enable_metrics: false
enable_cache: true
indexes:
code:
enabled: true
pattern: "*/.{js,ts,jsx,tsx,mjs,cjs,py,go,rs,java,kt,swift,c,cpp,h,hpp,cs,rb,php,scala,clj}"
ignore:
- "/node_modules/"
- "/.git/"
- "/dist/"
- "/build/"
- "/.opencode/"
- "/vendor/"
hybrid: true
bm25_weight: 0.3
docs:
enabled: true
pattern: "docs/*/.{md,mdx,txt,rst,adoc}"
hybrid: false
bm25_weight: 0.2
config:
enabled: false
pattern: "*/.{yaml,yml,json,toml,ini,env,xml}"
hybrid: false
bm25_weight: 0.3
exclude:
- node_modules
- vendor
- dist
- build
- out
- __pycache__
`$3
`yaml
vectorizer:
auto_index: false
`$3
`bash
export OPENCODE_SKIP_AUTO_INDEX=1
`---
Debugging
$3
`bash
export DEBUG=vectorizer
or all logs
export DEBUG=*
`Indexing activity is logged to
.opencode/indexer.log.---
Technical Details
- Vectorization: @xenova/transformers (ONNX Runtime)
- Vector DB: LanceDB (local, serverless)
- Chunk Store: bun:sqlite (WAL mode, concurrent reads)
- Graph DB: bun:sqlite (WAL mode, triple store)
- Model:
Xenova/all-MiniLM-L6-v2` (multilingual, 384 dimensions, ~23 MB)---
MIT
---
Made by the Comfanion team