@comfanion/usethis_search

Semantic code search with graph-based context for OpenCode

Search code by meaning, not by text. Get related context automatically via code graph.

---

What is this?

An OpenCode plugin that adds smart search to your project:

- Semantic search — finds code by meaning, even when words don't match
- Hybrid search — combines vector similarity + BM25 keyword matching
- Graph-based context — automatically attaches related code (imports, calls, type references) to search results
- Two-phase indexing — BM25 + graph search available immediately (Phase 1), vector search after embedding (Phase 2)
- Simplified API — 5 parameters, smart filter parsing, config-driven defaults
- Automatic indexing — files are indexed on change, zero effort
- Local vectorization — works offline, no API keys needed
- Three indexes — separate for code, docs, and configs

---

Quick Start

$3

``bash npm install @comfanion/usethis_search`

`$3`

Add to opencode.json:

`json { "plugin": ["@comfanion/usethis_search"] }`

`$3`

On OpenCode startup, the plugin automatically: 1. Creates indexes for code and documentation 2. Phase 1: chunks files, builds code graph (fast, parallel) — BM25 search available immediately 3. Phase 2: embeds chunks into vectors — hybrid search available after completion

Indexing time estimates: - < 100 files — ~1 min - < 500 files — ~3 min - 500+ files — ~10 min

---

`Search API`

The search tool has 5 parameters:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| |query| string | required | What you're looking for (semantic) | |index | string | "code" | Which index: code, docs, config| |limit| number | 10 | Number of results | |searchAll| boolean | false | Search across all indexes | |filter | string | — | Filter by path or language |

`$3`

`javascript // Basic semantic search search({ query: "authentication logic" })

// Search documentation search({ query: "how to deploy", index: "docs" })

// Search all indexes search({ query: "database connection", searchAll: true })

// Filter by directory search({ query: "tenant management", filter: "internal/domain/" })

// Filter by language search({ query: "event handling", filter: "*.go" }) search({ query: "middleware", filter: "go" })

// Combined: directory + language search({ query: "API routes", filter: "internal/*/.go" })

// Substring match on file path search({ query: "metrics", filter: "service" })

// More results search({ query: "error handling", limit: 20 })`

`$3`

The filter parameter is smart — it auto-detects what you mean:

| Input | Parsed as | |-------|-----------| |"internal/domain/"| Path prefix | |"*.go" or ".go"| Language filter (go) | |"go" or "python"| Language filter | |"internal/*/.go"| Path prefix + language | |"service" | Substring match on file path |

`$3`

Each result includes: - Score breakdown:Score: 0.619 (vec: 0.47, bm25: +0.04, kw: +0.11 | matched: "event", "correlation")- Rich metadata: language, function name, class name, heading context - File grouping: best chunk per file + "N matching sections" count - Related context: graph-expanded neighbors (imports, calls, type references) - Confidence signal: warning when top score < 0.45

When vectors are not yet available (Phase 2 in progress), search automatically falls back to BM25-only mode with a banner notification.

---

`Index Management`

`$3`

`bash

`Reindex everything`


bunx usethis_search reindex
Check status

bunx usethis_search status
List indexes

bunx usethis_search list
Clear index

bunx usethis_search clear

$3

`javascript // List all indexes with stats codeindex({ action: "list" })

// Check specific index status codeindex({ action: "status", index: "code" })

// Reindex codeindex({ action: "reindex", index: "code" })`

---

`Architecture`

`$3`

`Phase 1 (fast, parallel, 5 workers): file -> read -> chunk -> regex analyze -> graph edges -> ChunkStore (SQLite) Result: BM25 + graph search available immediately

Phase 2 (batch, sequential): ChunkStore chunks -> batch embed (32/batch) -> LanceDB Result: vector/hybrid search becomes available`

`$3`

`Has vectors? -> hybrid search (vector + BM25 + graph + keyword rerank) No vectors? -> BM25-only search (from ChunkStore + graph + keyword rerank)`

`$3`

`.opencode/ vectors/ code/ lancedb/ # Vector embeddings (LanceDB) chunks.db # Chunk content + metadata (SQLite, ChunkStore) hashes.json # File hashes for change detection docs/ lancedb/ chunks.db hashes.json graph/ code_graph.db # Code relationships (SQLite, GraphDB) doc_graph.db # Doc relationships (SQLite, GraphDB) vectorizer.yaml # Configuration indexer.log # Indexing log`

`$3`

| Module | Purpose | |--------|---------| | Core | | |vectorizer/index.ts| CodebaseIndexer, two-phase pipeline, search, singleton pool | |vectorizer/chunk-store.ts| SQLite chunk storage (BM25 without vectors) | |vectorizer/graph-db.ts| SQLite triple store for code relationships | |vectorizer/graph-builder.ts| Builds graph edges from code analysis | |vectorizer/bm25-index.ts| Inverted index for keyword search | | Chunking | | |vectorizer/chunkers/code-chunker.ts| Function/class-aware splitting | |vectorizer/chunkers/markdown-chunker.ts| Heading-aware splitting with hierarchy | |vectorizer/chunkers/chunker-factory.ts| Routes to correct chunker by file type | | Analysis | | |vectorizer/analyzers/regex-analyzer.ts| Regex-based code analysis (imports, calls, types) | |vectorizer/analyzers/lsp-analyzer.ts| LSP-based code analysis (definitions, references) | |vectorizer/analyzers/lsp-client.ts| Language Server Protocol client | | Search | | |vectorizer/hybrid-search.ts| Merge vector + BM25 scores | |vectorizer/query-cache.ts| LRU cache for query embeddings | |vectorizer/content-cleaner.ts| Remove noise (TOC, breadcrumbs, markers) | |vectorizer/metadata-extractor.ts| Extract file_type, language, tags, dates | | Tracking | | |vectorizer/search-metrics.ts| Search quality metrics | |vectorizer/usage-tracker.ts| Usage provenance tracking | | Tools | | |tools/search.ts| Search tool (5 params, smart filter, score breakdown) | |tools/codeindex.ts | Index management tool |

`$3`

The code graph tracks relationships between chunks:

- imports — file A imports module B - calls — function A calls function B - references — code references a type/interface - implements — class implements an interface - extends — class extends another class - belongs_to — chunk belongs to file (structural)

When you search, results are automatically expanded with 1-hop graph neighbors. Related context is scored by edge_weight cosine_similarity (or edge_weight 0.7 in BM25-only mode) and filtered by min_relevance.

`$3`

Multiple parallel searches share one CodebaseIndexer instance per (project, index) pair. No SQLite lock conflicts. Managed via getIndexer() / releaseIndexer() / destroyIndexer().

---

`Configuration`

`$3`

`yaml

`.opencode/vectorizer.yaml`


vectorizer:
  enabled: true
  auto_index: true
  model: "Xenova/all-MiniLM-L6-v2"
  debounce_ms: 1000
  cleaning:
    remove_toc: true
    remove_frontmatter_metadata: false
    remove_imports: false
    remove_comments: false
  chunking:
    strategy: "semantic"    # fixed | semantic
    markdown:
      split_by_headings: true
      min_chunk_size: 200
      max_chunk_size: 2000
      preserve_heading_hierarchy: true
    code:
      split_by_functions: true
      include_function_signature: true
      min_chunk_size: 300
      max_chunk_size: 1500
    fixed:
      max_chars: 1500
  search:
    hybrid: true
    bm25_weight: 0.3
    freshen: false              # Don't re-index on every search
    min_score: 0.35             # Minimum relevance cutoff
    include_archived: false
    default_limit: 10
  graph:
    enabled: true
    max_related: 4              # Max related chunks per result
    min_relevance: 0.5          # Min score for related context
    semantic_edges: false       # O(n^2) — enable only for small repos
    semantic_edges_max_chunks: 500
    lsp:
      enabled: true
      timeout_ms: 5000
    read_intercept: true
  quality:
    enable_metrics: false
    enable_cache: true
  indexes:
    code:
      enabled: true
      pattern: "*/.{js,ts,jsx,tsx,mjs,cjs,py,go,rs,java,kt,swift,c,cpp,h,hpp,cs,rb,php,scala,clj}"
      ignore:
        - "/node_modules/"
        - "/.git/"
        - "/dist/"
        - "/build/"
        - "/.opencode/"
        - "/vendor/"
      hybrid: true
      bm25_weight: 0.3
    docs:
      enabled: true
      pattern: "docs/*/.{md,mdx,txt,rst,adoc}"
      hybrid: false
      bm25_weight: 0.2
    config:
      enabled: false
      pattern: "*/.{yaml,yml,json,toml,ini,env,xml}"
      hybrid: false
      bm25_weight: 0.3

exclude: - node_modules - vendor - dist - build - out - __pycache__`

`$3`

`yaml vectorizer: auto_index: false`

`$3`

`bash export OPENCODE_SKIP_AUTO_INDEX=1`

---

`Debugging`

`$3`

`bash export DEBUG=vectorizer

`or all logs`


export DEBUG=*

Indexing activity is logged to .opencode/indexer.log.

---

`Technical Details`

- Vectorization: @xenova/transformers (ONNX Runtime) - Vector DB: LanceDB (local, serverless) - Chunk Store: bun:sqlite (WAL mode, concurrent reads) - Graph DB: bun:sqlite (WAL mode, triple store) - Model:Xenova/all-MiniLM-L6-v2` (multilingual, 384 dimensions, ~23 MB)
- Embedding speed: ~0.5 sec/file
- Phase 1 speed: ~0.05 sec/file (no embedding)
- Supported languages: JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Swift, C/C++, C#, Ruby, PHP, Scala, Clojure

---

License

MIT

---

Made by the Comfanion team