Magector

Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.

Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 21 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks "how are checkout totals calculated?" and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).

![Rust](https://www.rust-lang.org)
![Node.js](https://nodejs.org)
![Magento](https://magento.com)
![Adobe Commerce](https://business.adobe.com/products/magento/magento-commerce.html)
![Accuracy](#validation)
![License: MIT](LICENSE)

---

Why Magector

Magento 2 and Adobe Commerce have 18,000+ PHP, XML, JS, PHTML, and GraphQL files spread across hundreds of modules. The codebase relies heavily on indirection — plugins intercept methods defined in other modules, observers react to events dispatched elsewhere, di.xml rewires interfaces to concrete classes, and layout XML stitches blocks and templates together. No single file tells the full story.

Generic search tools — grep, IDE search, or the keyword matching built into AI assistants — can't bridge this gap. They find literal strings but can't connect "how does checkout calculate totals?" to TotalsCollector.php when the word "totals" appears in hundreds of unrelated files.

Magector solves this with three layers of intelligence:

1. Semantic vector index — every file is embedded into a 384-dimensional space (ONNX, all-MiniLM-L6-v2) where meaning matters more than keywords. A search for "payment capture" returns CaptureOperation.php because the embeddings are close, not because the file contains the word "capture".

2. Magento technology awareness — 20+ pattern detectors identify plugins, observers, controllers, blocks, cron jobs, GraphQL resolvers, DI preferences, layout XML, and more. Every search result is enriched with what kind of Magento component it is, so the AI client understands the code's role in the system.

3. Adaptive learning (SONA) — Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.

The result: your AI assistant calls one MCP tool and gets ranked, pattern-enriched results in 10-45ms — instead of burning tokens grepping through dozens of wrong files. High relevance accuracy means the AI reads fewer, more targeted files, which optimizes context window usage, reduces API costs, and accelerates development cycles.

| Approach | Semantic matches | Magento-aware | Speed (18K files) |
|----------|:---------------------:|:---------------------------:|:-----------------:|
| grep / ripgrep | No | No | 100-500ms |
| IDE search | No | No | 200-1000ms |
| GitHub search | Partial | No | 500-2000ms |
| Magector | Yes | Yes | 10-45ms |

---

Features

- Semantic search -- find code by meaning, not exact keywords
- 99.2% accuracy -- validated with 101 E2E test queries across 16 tool categories, plus 557 Rust-level test cases
- Hybrid search -- combines semantic vector similarity with keyword re-ranking for best-of-both-worlds results
- Structured JSON output -- results include file path, class name, methods list, role badges, and content snippets for minimal round-trips
- Persistent serve mode -- keeps ONNX model and HNSW index resident in memory, eliminating cold-start latency
- Incremental re-indexing -- background file watcher detects changes and updates the index without restart (tombstone + compact strategy)
- ONNX embeddings -- native 384-dim transformer embeddings via ONNX Runtime
- 36K+ vectors -- indexes the complete Magento 2 / Adobe Commerce codebase including framework internals
- Magento-aware -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns
- Adobe Commerce compatible -- works with both Magento Open Source and Adobe Commerce (B2B, Staging, and all Commerce-specific modules)
- AST-powered -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance
- Cross-tool discovery -- tool descriptions include keywords and "See also" references so AI clients find the right tool on the first try
- SONA feedback learning -- self-adjusting search that learns from MCP tool call patterns (e.g., search → find_plugin refines future rankings for similar queries)
- SONA v2 with MicroLoRA + EWC++ -- rank-2 low-rank adapter (1536 params, ~6KB) adjusts query embeddings based on learned patterns; Elastic Weight Consolidation prevents catastrophic forgetting during online learning
- Diff analysis -- risk scoring and change classification for git commits and staged changes
- Complexity analysis -- cyclomatic complexity, function count, and hotspot detection across modules
- Fast -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
- LLM description enrichment -- generate natural-language descriptions of di.xml files using Claude, stored in SQLite, and prepend them to embedding text so descriptions influence vector search ranking (not just post-retrieval display)
- MCP server -- 21 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
- Clean architecture -- Rust core handles all indexing/search, Node.js MCP server delegates to it

---

Architecture

``mermaid flowchart TD subgraph rust ["Rust Core"] A["AST Parser · PHP + JS"] B["Pattern Detection · 20+"] B2["Description Enrichment"] C["ONNX Embedder · 384d"] D["HNSW + Reranking"] A --> B --> B2 --> C --> D end subgraph node ["Node.js Layer"] E["MCP Server · 21 tools"] F["Persistent Serve"] G["CLI · init/index/search/describe"] E --> F G --> F end node -->|stdin/stdout JSON| rust

style rust fill:#f4a460,color:#000 style node fill:#68b684,color:#000`

`$3`

`mermaid flowchart TD A[Source File] --> B[AST Parser] B --> C[Pattern Detection] C --> D[Text Enrichment] D --> D2{Description DB?} D2 -->|Yes| D3["Prepend Description"] D2 -->|No| E[ONNX Embedding] D3 --> E E --> F[(HNSW Index)] A --> G[Metadata] G --> F`

`$3`

`mermaid flowchart TD Q[Query] --> E1[Synonym Enrichment] E1 --> E2[ONNX Embedding] E2 --> H[HNSW Search] H --> R[Hybrid Reranking] R --> SA[SONA Adjustment + MicroLoRA] SA --> J[Structured JSON]`

`$3`

| Component | Technology | Purpose | |-----------|-----------|---------| | Embeddings |ort(ONNX Runtime) | all-MiniLM-L6-v2, 384 dimensions | | Vector search |hnsw_rs+ hybrid reranking | Approximate nearest neighbor + keyword boosting | | PHP parsing |tree-sitter-php| Class, method, namespace extraction | | JS parsing |tree-sitter-javascript| AMD/ES6 module detection | | Pattern detection | Custom Rust | 20+ Magento-specific patterns | | CLI |clap| Command-line interface (index, search, serve, validate) | | Descriptions |rusqlite(bundled SQLite) | LLM-generated di.xml descriptions stored in SQLite, prepended to embeddings | | SONA | Custom Rust | Feedback learning with MicroLoRA + EWC++ | | MCP server |@modelcontextprotocol/sdk | AI tool integration with structured JSON output |

---

`Quick Start`

`$3`

- Node.js 18+

`$3`

`bash cd /path/to/your/magento2 # or Adobe Commerce project npx magector init`

This single command handles the entire setup:

`mermaid flowchart TD A["npx magector init"] --> B[Verify Project] B --> C[Download Model] C --> D[Index Codebase] D --> E[Detect IDE] E --> F[Write Config] F --> G[Update .gitignore]`

`$3`

`bash npx magector search "product price calculation" npx magector search "checkout totals collector" -l 20`

`$3`

`bash npx magector index`

`$3`

`bash npx magector setup`

---

`CLI Reference`

`$3`

`magector-core

Commands: index Index a Magento codebase search Search the index semantically serve Start persistent server mode (stdin/stdout JSON protocol) describe Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY) validate Run validation suite (downloads Magento if needed) download Download Magento 2 Open Source stats Show index statistics embed Generate embedding for text`

#### index

`bash magector-core index [OPTIONS]

Options: -m, --magento-root Path to Magento root directory -d, --database Index database path [default: ./.magector/index.db] -c, --model-cache Model cache directory [default: ./models] --descriptions-db Path to descriptions SQLite DB (descriptions are prepended to embeddings) -v, --verbose Enable verbose output`

When --descriptions-db is provided (or auto-detected as sqlite.db next to the index), descriptions are prepended to the embedding text as "Description: {text}\n\n" before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.

#### search

`bash magector-core search [OPTIONS]

Options: -d, --database Index database path [default: ./.magector/index.db] -l, --limit Number of results [default: 10] -f, --format Output format: text, json [default: text]`

#### describe

`bash magector-core describe [OPTIONS]

Options: -m, --magento-root Path to Magento root directory -o, --output Output SQLite database [default: ./.magector/sqlite.db] --force Re-describe all files (ignore cache)`

Generates natural-language descriptions of di.xml files using the Anthropic API (Claude Sonnet). Requires ANTHROPIC_API_KEY environment variable. Descriptions are stored in a SQLite database and used during indexing to enrich embeddings. Only files with changed content hashes are re-described (incremental by default).

#### serve

`bash magector-core serve [OPTIONS]

Options: -d, --database Index database path [default: ./.magector/index.db] -c, --model-cache Model cache directory [default: ./models] -m, --magento-root Magento root (enables file watcher) --descriptions-db Path to descriptions SQLite DB --watch-interval File watcher poll interval [default: 60]`

Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.

When --magento-root is provided, a background file watcher polls for changed files every --watch-interval seconds and incrementally re-indexes them without restart. Modified and deleted files are soft-deleted (tombstoned) in the HNSW index; new vectors are appended. When tombstoned entries exceed 20% of total vectors, the index is automatically compacted by rebuilding the HNSW graph.

Protocol (one JSON object per line):

`json // Request: {"command":"search","query":"product price","limit":10}

// Response: {"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}

// Stats request: {"command":"stats"}

// Watcher status: {"command":"watcher_status"} // Response: {"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}

// Descriptions (all LLM descriptions from SQLite DB): {"command":"descriptions"} // Response: {"ok":true,"data":{"app/code/Magento/Catalog/etc/di.xml":{"hash":"...","description":"...","model":"claude-sonnet-4-5-20250929","timestamp":1769875137},...}}

// Describe (generate descriptions + auto-reindex affected files): {"command":"describe"} // Response: {"ok":true,"data":{"files_found":371,"described":5,"skipped":366,"errors":0,"described_paths":["app/code/..."]}}

// SONA feedback: {"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]} // Response: {"ok":true,"data":{"learned":1}}

// SONA status: {"command":"sona_status"} // Response: {"ok":true,"data":{"learned_patterns":5,"total_observations":12}}

`$3`

`bash npx magector init [path] # Full setup: index + IDE config npx magector index [path] # Index (or re-index) Magento codebase npx magector search # Search indexed code npx magector describe [path] # Generate LLM descriptions for di.xml files npx magector stats # Show indexer statistics npx magector setup [path] # IDE setup only (no indexing) npx magector mcp # Start MCP server npx magector help # Show help`

The describe command requires ANTHROPIC_API_KEY. After running describe, the next index automatically picks up the descriptions DB and embeds them into the vectors.

`$3`

| Variable | Description | Default | |----------|-------------|---------| |MAGENTO_ROOT| Path to Magento installation | Current directory | |MAGECTOR_DB | Path to index database | ./.magector/index.db| |MAGECTOR_BIN| Path to magector-core binary | Auto-detected | |MAGECTOR_MODELS | Path to ONNX model directory | ~/.magector/models/| |ANTHROPIC_API_KEY | API key for description generation (describe command) | — |

---

`MCP Server Tools`

The MCP server exposes 21 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return structured JSON with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.

`$3`

All search tools return structured JSON:

`json { "results": [ { "rank": 1, "score": 0.892, "path": "vendor/magento/module-catalog/Model/ProductRepository.php", "module": "Magento_Catalog", "className": "ProductRepository", "namespace": "Magento\\Catalog\\Model", "methods": ["save", "getById", "getList", "delete", "deleteById"], "magentoType": "repository", "fileType": "php", "badges": ["repository"], "snippet": "class ProductRepository implements ProductRepositoryInterface..." } ], "count": 1 }`

Key fields: -methods-- list of method names in the class (avoids needing to read the file) -badges -- role indicators: plugin, controller, observer, repository, graphql-resolver, model, block-snippet -- first 300 characters of indexed content for quick assessment

`$3`

| Tool | Description | |------|-------------| |magento_search| Semantic search -- find any PHP class, method, XML config, template, or GraphQL schema by natural language | |magento_find_class| Find PHP class, interface, abstract class, or trait by name | |magento_find_method | Find method implementations across the codebase |

`$3`

| Tool | Description | |------|-------------| |magento_find_config| Find XML configuration (di.xml, events.xml, routes.xml, system.xml, webapi.xml, module.xml, layout) | |magento_find_template| Find PHTML template files for frontend or admin rendering | |magento_find_plugin| Find interceptor plugins (before/after/around methods) and di.xml declarations | |magento_find_observer| Find event observers and events.xml declarations | |magento_find_preference| Find DI preference overrides -- which class implements an interface | |magento_find_controller| Find MVC controllers by frontend or admin route path | |magento_find_block| Find Block classes for view rendering | |magento_find_graphql| Find GraphQL schema definitions, resolvers, types, queries, and mutations | |magento_find_api| Find REST/SOAP API endpoints in webapi.xml | |magento_find_cron| Find cron job definitions in crontab.xml | |magento_find_db_schema | Find database table definitions in db_schema.xml (declarative schema) |

`$3`

| Tool | Description | |------|-------------| |magento_trace_flow | Trace execution flow from an entry point (route, API, GraphQL, event, cron) -- maps controller → plugins → observers → templates in one call |

Auto-detects entry type from pattern (/V1/... → API, snake_case → event, camelCase → GraphQL, path/segments → route), or override with entryType. Use depth: "shallow" (entry + config + plugins) or depth: "deep" (adds observers, layout, templates, DI preferences).

`$3`

| Tool | Description | |------|-------------| |magento_analyze_diff| Analyze git diffs for risk scoring and change classification | |magento_complexity | Analyze cyclomatic complexity, function count, and line count |

`$3`

| Tool | Description | |------|-------------| |magento_module_structure| Show complete module structure -- controllers, models, blocks, plugins, observers, configs | |magento_index| Trigger re-indexing of the codebase | |magento_describe | Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY), stored in SQLite, auto-reindexes affected files | |magento_stats | View index statistics |

`$3`

Each tool description includes "See also" hints to help AI clients chain tools effectively:

`mermaid graph TD cls["find_class"] --> plg["find_plugin"] cls --> prf["find_preference"] cls --> mtd["find_method"] cfg["find_config"] --> obs["find_observer"] cfg --> prf cfg --> api["find_api"] plg --> cls plg --> mtd tpl["find_template"] --> blk["find_block"] blk --> tpl blk --> cfg dbs["find_db_schema"] --> cls gql["find_graphql"] --> cls gql --> mtd ctl["find_controller"] --> cfg trc["trace_flow"] -.-> ctl trc -.-> plg trc -.-> obs trc -.-> tpl trc -.-> api trc -.-> gql

style cls fill:#4a90d9,color:#fff style mtd fill:#4a90d9,color:#fff style cfg fill:#e8a838,color:#000 style plg fill:#d94a4a,color:#fff style obs fill:#d94a4a,color:#fff style prf fill:#e8a838,color:#000 style api fill:#e8a838,color:#000 style tpl fill:#68b684,color:#000 style blk fill:#68b684,color:#000 style dbs fill:#9b59b6,color:#fff style gql fill:#9b59b6,color:#fff style ctl fill:#4a90d9,color:#fff style trc fill:#2ecc71,color:#000`

`$3`

`magento_search("how are checkout totals calculated") magento_search("product price with tier pricing and catalog rules") magento_find_class("ProductRepositoryInterface") magento_find_method("getById") magento_find_config("di.xml plugin for ProductRepository") magento_find_plugin({ targetClass: "Topmenu" }) magento_find_observer("sales_order_place_after") magento_find_preference("StoreManagerInterface") magento_find_api("/V1/orders") magento_find_controller("catalog/product/view") magento_find_graphql("placeOrder") magento_find_db_schema("sales_order") magento_find_cron("indexer") magento_find_block("cart totals") magento_find_template("minicart") magento_analyze_diff({ commitHash: "abc123" }) magento_complexity({ module: "Magento_Catalog", threshold: 10 }) magento_describe() magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" }) magento_trace_flow({ entryPoint: "/V1/products" }) magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" }) magento_trace_flow({ entryPoint: "sales_order_place_after" })`

---

`Supported Platforms`

Pre-built binaries are provided for the following platforms:

| Platform | Architecture | Package | |----------|-------------|---------| | macOS | ARM64 (Apple Silicon) |@magector/cli-darwin-arm64| | Linux | x86_64 |@magector/cli-linux-x64| | Linux | ARM64 |@magector/cli-linux-arm64| | Windows | x86_64 |@magector/cli-win32-x64 |

> Note: macOS Intel (x86_64) is not supported as a pre-built binary. Intel Mac users can build from source.

---

`Validation`

Magector is validated at two levels:

1. E2E MCP accuracy tests -- 101 queries across 16 tool categories via stdio JSON-RPC 2. Rust-level validation -- 557 test cases across 50+ categories against Magento 2.4.7

`$3`

`mermaid --- config: themeVariables: pie1: "#4caf50" pie2: "#f44336" --- pie title Test Pass Rate (101 queries) "Passed (101)" : 101 "Failed (0)" : 0`

| Metric | Value | |--------|-------| | Grade | A+ (99.2/100) | | Pass rate | 101/101 (100%) | | Precision | 98.7% | | MRR | 99.3% | | NDCG@10 | 98.7% | | Index size | 35,795 vectors | | Query time | 10-45ms |

`$3`

66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including magento_describe), analysis tools, and stdout JSON integrity.

`$3`

`bash

`E2E accuracy tests (101 queries, requires indexed codebase)`


npm run test:accuracy
npm run test:accuracy:verbose
Integration tests (66 tests)

npm test
SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)

npm run test:sona-eval
npm run test:sona-eval:verbose
Rust validation (557 test cases)

cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index


---
Project Structure

`magector/ ├── src/ # Node.js source │ ├── cli.js # CLI entry point (npx magector ) │ ├── mcp-server.js # MCP server (20 tools, structured JSON output) │ ├── binary.js # Platform binary resolver │ ├── model.js # ONNX model resolver/downloader │ ├── init.js # Full init command (index + IDE config) │ ├── magento-patterns.js # Magento pattern detection (JS) │ ├── templates/ # IDE rules templates │ │ ├── cursorrules.js # .cursorrules content │ │ └── claude-md.js # CLAUDE.md content │ └── validation/ # JS validation suite │ ├── validator.js │ ├── benchmark.js │ ├── test-queries.js │ ├── test-data-generator.js │ └── accuracy-calculator.js ├── tests/ # Automated tests │ ├── mcp-server.test.js # Integration tests (64 tests) │ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries) │ ├── mcp-sona.test.js # SONA feedback integration tests (8 tests) │ ├── mcp-sona-eval.test.js # SONA/MicroLoRA benefit evaluation (180 queries) │ ├── describe-benefit-eval.test.js # Description enrichment benefit evaluation │ └── results/ # Test result artifacts │ ├── accuracy-report.json │ └── sona-eval-report.json ├── platforms/ # Platform-specific binary packages │ ├── darwin-arm64/ # macOS ARM (Apple Silicon) │ ├── linux-x64/ # Linux x64 │ ├── linux-arm64/ # Linux ARM64 │ └── win32-x64/ # Windows x64 ├── rust-core/ # Rust high-performance core │ ├── Cargo.toml │ ├── src/ │ │ ├── main.rs # Rust CLI (index, search, serve, validate) │ │ ├── lib.rs # Library exports │ │ ├── indexer.rs # Core indexing with progress output │ │ ├── embedder.rs # ONNX embedding (MiniLM-L6-v2) │ │ ├── vectordb.rs # HNSW vector database + hybrid search + tombstones │ │ ├── watcher.rs # File watcher for incremental re-indexing │ │ ├── ast.rs # Tree-sitter AST (PHP + JS) │ │ ├── magento.rs # Magento pattern detection (Rust) │ │ ├── describe.rs # LLM description generation + SQLite storage │ │ ├── sona.rs # SONA feedback learning + MicroLoRA + EWC++ │ │ └── validation.rs # 557 test cases, validation framework │ └── models/ # ONNX model files (auto-downloaded) │ ├── all-MiniLM-L6-v2.onnx │ └── tokenizer.json ├── .github/ │ └── workflows/ │ └── release.yml # Cross-compile + publish CI ├── scripts/ │ └── setup.sh # Claude Code MCP setup script ├── config/ │ └── mcp-config.json # MCP server configuration template ├── package.json ├── .gitignore ├── LICENSE └── README.md`

---

`How It Works`

`$3`

Magector scans every .php, .js, .xml, .phtml, and .graphqls file in a Magento 2 or Adobe Commerce codebase:

1. AST parsing -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files 2. Pattern detection -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more 3. Search text enrichment -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations 4. Description enrichment -- If a descriptions SQLite DB is present, LLM-generated natural-language descriptions are prepended to the embedding text as"Description: {text}\n\n", placing semantic DI concepts (preferences, plugins, virtual types, subsystem names) within the 256-token ONNX window 5. Embedding -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2 6. Indexing -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search

`$3`

1. Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch") 2. The enriched query is embedded into the same 384-dimensional vector space 3. HNSW finds the nearest neighbors by cosine similarity 4. Hybrid reranking boosts results with keyword matches in path and search text 5. SONA adjustment -- MicroLoRA adapts the query embedding based on learned patterns; EWC++ prevents forgetting earlier learning 6. Results are returned as structured JSON with file path, class name, methods, role badges, and content snippet

`$3`

The MCP server spawns a persistent Rust process (magector-core serve) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot execFileSync if the serve process is unavailable.

`mermaid flowchart TD subgraph startup ["Startup (once)"] S1[Load Model] --> S2[Load Index] S2 --> S3[Ready Signal] end subgraph query ["Per Query (10-45ms)"] Q1[stdin JSON] --> Q2[Embed] Q2 --> Q3[HNSW Search] Q3 --> Q4[Rerank] Q4 --> Q5[stdout JSON] end startup --> query subgraph fallback ["Fallback"] F1[execFileSync ~2.6s] end

style startup fill:#e8f4e8,color:#000 style query fill:#e8e8f4,color:#000 style fallback fill:#f4e8e8,color:#000`

`$3`

When the serve process is started with --magento-root, a background thread polls the filesystem for changes every 60 seconds (configurable via --watch-interval). Changed files are incrementally re-indexed without restarting the server.

Since hnsw_rs does not support point deletion, Magector uses a tombstone strategy: old vectors for modified/deleted files are marked as tombstoned and filtered out of search results. New vectors are appended. When tombstoned entries exceed 20% of total vectors, the HNSW graph is automatically rebuilt (compacted) to reclaim memory and restore search performance.

`mermaid flowchart TD W1[Sleep 60s] --> W2[Scan Filesystem] W2 --> W3{Changes?} W3 -->|No| W1 W3 -->|Yes| W4[Tombstone Old Vectors] W4 --> W5[Parse + Embed New Files] W5 --> W6[Append to HNSW] W6 --> W7{Tombstone > 20%?} W7 -->|Yes| W8[Compact / Rebuild HNSW] W7 -->|No| W9[Save to Disk] W8 --> W9 W9 --> W1

style W4 fill:#f4e8e8,color:#000 style W5 fill:#e8f4e8,color:#000 style W8 fill:#e8e8f4,color:#000`

`$3`

The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.

`mermaid sequenceDiagram participant Dev participant AI participant MCP participant Rust participant HNSW

Dev->>AI: "checkout totals?" AI->>MCP: magento_search(...) MCP->>Rust: JSON query Rust->>HNSW: embed + search HNSW-->>Rust: candidates Rust-->>MCP: JSON results MCP-->>AI: paths, methods, badges AI-->>Dev: TotalsCollector.php`

`$3`

The MCP server tracks sequences of tool calls and sends feedback signals to the Rust process. Over time, this adjusts search result rankings based on observed usage patterns.

How it works: The Node.js SessionTracker watches for follow-up tool calls after magento_search. If a user searches and then immediately calls magento_find_plugin, SONA learns that similar queries should boost plugin results. The learned weights are persisted to a .sona file alongside the index.

| MCP Call Sequence | Signal | Effect on Future Searches | |---|---|---| |magento_search → magento_find_plugin (within 30s) | refinement_to_plugin| Boosts plugin results | |magento_search → magento_find_class (within 30s) | refinement_to_class| Boosts class matches | |magento_search → magento_find_config (within 30s) | refinement_to_config| Boosts config/XML results | |magento_search → magento_find_observer (within 30s) | refinement_to_observer| Boosts observer results | |magento_search → magento_find_controller (within 30s) | refinement_to_controller| Boosts controller results | |magento_search → magento_find_block (within 30s) | refinement_to_block| Boosts block results | |magento_search → magento_trace_flow (within 30s) | trace_after_search| Boosts controller results | |magento_search(Q1) → magento_search(Q2) (within 60s) | query_refinement | Tracked for analysis |

Characteristics: - Score adjustments are capped at ±0.15 to avoid overwhelming semantic similarity - Learning rate decays with repeated observations (diminishing returns) - Learned weights are keyed by normalized, order-independent query term hashes - Always active -- no feature flags or build-time opt-in required - Persisted via bincode to.sona (e.g., .magector/index.db.sona)

SONA v2: MicroLoRA + EWC++

SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weight Consolidation:

| Component | Parameters | Purpose | |-----------|-----------|---------| | MicroLoRA | 1536 (rank-2, 2×384×2) | Adjusts query embeddings before HNSW search | | EWC++ | Fisher matrix (384 values) | Prevents catastrophic forgetting during online learning |

- adjust_query_embedding()applies the LoRA transform + L2 normalization before vector search; cosine similarity guard (≥0.90) skips destructive adjustments -learn_with_embeddings()updates LoRA weights from feedback signals with EWC regularization (λ=2000) and decaying learning rate - 3-tier scoring with negative learning: positive signals boost the followed feature type, mild negative learning (0.1×) demotes unrelated types - V1→V2 persistence format is backward-compatible (auto-upgrades on load)

`bash cd rust-core && cargo build --release`

`$3`

Magector can generate natural-language descriptions of di.xml files using the Anthropic API and embed them directly into the vector index. This significantly improves search ranking for semantic queries about dependency injection.

Workflow:

`bash

`1. Generate descriptions (one-time, incremental — only re-describes changed files)`


ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento
2. Re-index with descriptions embedded into vectors

npx magector index /path/to/magento

Or via the MCP tool: magento_describe() generates descriptions and auto-reindexes affected files in one step.

How it works: Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (.magector/sqlite.db). During indexing, descriptions are prepended to the embedding text as "Description: {text}\n\n" before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.

Measured impact (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):

| Metric | Without Descriptions | With Descriptions | Delta | |--------|---------------------|-------------------|-------| | Precision@K | 1.6% | 20.3% | +18.7% | | MRR | 0.031 | 0.330 | +0.30 | | NDCG@10 | 0.037 | 0.369 | +0.33 | | di.xml results/query | 0.2 | 3.0 | +2.8 | | Query win rate | — | — | 76% |

---

`Magento Patterns Detected`

`mermaid mindmap root((Patterns)) PHP Controller Model Repository Block Helper ViewModel Interception Plugin Observer Preference XML di.xml events.xml webapi.xml routes.xml crontab.xml db_schema.xml Frontend Template JavaScript GraphQL`

Magector understands these Magento 2 architectural patterns:

| Pattern | Detection Method | Example | |---------|-----------------|---------| | Controller | Path +execute() method | Controller/Adminhtml/Order/View.php| | Model | Path + extendsAbstractModel | Model/Product.php| | Repository | Path + implementsRepositoryInterface | Model/ProductRepository.php| | Block | Path + extendsAbstractBlock | Block/Product/View.php| | Plugin | Path + before/after/around methods |Plugin/Product/SavePlugin.php| | Observer | Path + implementsObserverInterface | Observer/ProductSaveObserver.php| | GraphQL Resolver | Path + implementsResolverInterface | Model/Resolver/Products.php| | Helper | Path underHelper/ | Helper/Data.php| | Cron | Path underCron/ | Cron/CleanExpiredQuotes.php| | Console Command | Path + extendsCommand | Console/Command/IndexerReindex.php| | Data Provider | Path +DataProvider | Ui/DataProvider/Product/Listing.php| | ViewModel | Path + implementsArgumentInterface | ViewModel/Product/Breadcrumbs.php| | Setup Patch | Path +Patch/Data or Patch/Schema | Setup/Patch/Data/AddAttribute.php| | di.xml | Path matching |etc/di.xml, etc/frontend/di.xml| | events.xml | Path matching |etc/events.xml| | webapi.xml | Path matching |etc/webapi.xml| | layout XML | Path underlayout/ | view/frontend/layout/catalog_product_view.xml| | Template |.phtml extension | view/frontend/templates/product/view.phtml| | JavaScript |.js with AMD/ES6 detection | view/frontend/web/js/view/minicart.js| | GraphQL Schema |.graphqls extension | etc/schema.graphqls |

---

`Configuration`

`$3`

Copy .cursorrules to your Magento project root for optimized AI-assisted development. The rules instruct the AI to:

1. Use Magector MCP tools before reading files manually 2. Write effective semantic queries 3. Follow Magento development patterns 4. Interpret search results correctly

`$3`

The ONNX model (all-MiniLM-L6-v2) is automatically downloaded on first run to ~/.magector/models/. To use a different location:

`bash magector-core index -m /path/to/magento -c /custom/model/path`

---

`Development`

`$3`

`bash git clone https://github.com/krejcif/magector.git cd magector

`Install Node.js dependencies`


npm install
Build the Rust core

cd rust-core
cargo build --release
cd ..
The CLI will automatically find the dev binary at rust-core/target/release/magector-core

node src/cli.js help

$3

`bash

`Rust core`


cd rust-core
cargo build --release
Run unit tests

cargo test
Run validation

cargo run --release -- validate

$3

`bash

`Integration tests (66 tests, requires indexed codebase)`


npm test
E2E accuracy tests (101 queries)

npm run test:accuracy
npm run test:accuracy:verbose
Run without index (unit + schema tests only)

npm run test:no-index
Rust unit tests (37 tests including SONA + descriptions)

cd rust-core && cargo test
SONA integration tests (8 tests)

node tests/mcp-sona.test.js
SONA/MicroLoRA benefit evaluation (180 queries)

npm run test:sona-eval
Rust validation (557 test cases)

cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index

$3

1. Add pattern detection in rust-core/src/magento.rs2. Add search text enrichment inrust-core/src/indexer.rs3. Add validation test cases inrust-core/src/validation.rs4. Add E2E accuracy test cases intests/mcp-accuracy.test.js5. Rebuild and run validation to verify:

`bash cargo build --release ./target/release/magector-core validate -m ./magento2 --skip-index npm run test:accuracy`

`$3`

1. Define the tool schema in src/mcp-server.js(ListToolsRequestSchema handler) 2. Include keyword-rich descriptions and cross-tool "See also" references 3. Implement the handler in the CallToolRequestSchema handler 4. Return structured JSON viaformatSearchResults()5. Add E2E test cases intests/mcp-accuracy.test.js6. Test with Claude Code or the MCP inspector

---

`Technical Details`

`$3`

- Model: all-MiniLM-L6-v2 - Dimensions: 384 - Pooling: Mean pooling with attention mask - Normalization: L2 normalized - Runtime: ONNX Runtime (viaort crate)

`$3`

- Algorithm: HNSW (Hierarchical Navigable Small World) - Library:hnsw_rs- Parameters: M=32, max_layers=16, ef_construction=200 - Distance metric: Cosine similarity - Hybrid search: Semantic nearest-neighbor + keyword reranking in path and search text + SONA/MicroLoRA feedback adjustments - Incremental updates: Tombstone soft-delete + periodic HNSW rebuild (compact) - Persistence: Bincode V2 binary serialization (backward-compatible with V1)

`$3`

Each indexed file produces a vector entry with metadata:

`rust struct IndexMetadata { path: String, file_type: String, // php, xml, js, template, graphql magento_type: String, // controller, model, block, plugin, ... class_name: Option, namespace: Option, methods: Vec, // extracted method names search_text: String, // enriched searchable text is_controller: bool, is_plugin: bool, is_observer: bool, is_model: bool, is_block: bool, is_repository: bool, is_resolver: bool, // ... 20+ pattern flags }`

`$3`

| Operation | Time | Notes | |-----------|------|-------| | Full index (36K vectors) | ~1 min | Parallel parsing + batched ONNX embedding | | Single query (warm) | 10-45ms | Persistent serve process, HNSW + rerank | | Single query (cold) | ~2.6s | Includes ONNX model + index load | | Embedding generation | ~2ms | ONNX Runtime with CoreML/CUDA | | Batch embedding (32) | ~30ms | Batched ONNX inference | | Model load | ~500ms | One-time at startup | | Index save/load | <1s | Bincode binary serialization |

`$3`

- Persistent serve mode -- Rust process keeps ONNX model + HNSW index in memory via stdin/stdout JSON protocol - Query cache -- LRU cache (200 entries) avoids re-embedding identical queries - Hybrid reranking -- combines semantic similarity with keyword matching for better precision - Batched ONNX embedding -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding - Dynamic thread scaling -- ONNX intra-op threads scale to CPU core count - Thread-local AST parsers -- each rayon thread gets its own tree-sitter parser (no mutex contention) - Bincode persistence -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files) - Adaptive HNSW capacity -- pre-sized to actual vector count - Parallel HNSW insert -- batch insert uses hnsw_rs parallel insertion on load and index - Tuned ef_search -- optimized search parameters for 36K vector index (ef_search=50 for search, 64 for hybrid) - SONA feedback learning -- learns from MCP tool call patterns to adjust search rankings; MicroLoRA adapts query embeddings, EWC++ prevents forgetting

---

`Roadmap`

`mermaid gantt title Roadmap dateFormat YYYY-MM axisFormat %b section Done Hybrid search :done, 2025-01, 30d Serve mode :done, 2025-02, 30d JSON output :done, 2025-03, 15d Cross-tool hints :done, 2025-03, 15d E2E tests :done, 2025-03, 15d Adobe Commerce :done, 2025-03, 15d section Next SONA feedback :done, 2025-04, 30d Incremental index :done, 2025-04, 30d SONA v2 MicroLoRA :done, 2025-05, 15d LLM descriptions :done, 2025-06, 30d Method chunking :active, 2025-07, 30d Intent detection :2025-08, 30d Type filtering :2025-09, 30d section Future VSCode extension :2025-10, 60d Web UI :2025-12, 60d`

- [x] Hybrid search (semantic + keyword re-ranking) - [x] Persistent serve mode (eliminates cold-start latency) - [x] Structured JSON output (methods, badges, snippets) - [x] Cross-tool discovery hints for AI clients - [x] E2E accuracy test suite (101 queries) - [x] Adobe Commerce support (B2B, Staging, and all Commerce-specific modules) - [x] SONA feedback learning (search rankings adapt to MCP tool call patterns) - [x] SONA v2 with MicroLoRA + EWC++ (embedding-level adaptation, prevents catastrophic forgetting) - [x] LLM description enrichment (generate di.xml descriptions via Claude, store in SQLite, embed into vectors for improved search ranking) - [ ] Method-level chunking (per-method vectors for direct method search) - [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP") - [ ] Filtered search by file type at the vector level - [x] Incremental indexing (background file watcher with tombstone + compact strategy) - [ ] VSCode extension - [ ] Web UI for browsing results

---

`License`

MIT License. See LICENSE for details.

---

`Contributing`

Contributions are welcome. Please:

1. Fork the repository 2. Create a feature branch (git checkout -b feature/improvement) 3. Add tests for new functionality 4. Run validation to ensure accuracy doesn't regress:npm run test:accuracy`
5. Submit a pull request

---

Built with Rust and Node.js for the Magento and Adobe Commerce community.

Magector

Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.

---

Why Magector

Magector solves this with three layers of intelligence:

3. Adaptive learning (SONA) — Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.

---

Features

---

Architecture

style rust fill:#f4a460,color:#000 style node fill:#68b684,color:#000`

`$3`

`mermaid flowchart TD Q[Query] --> E1[Synonym Enrichment] E1 --> E2[ONNX Embedding] E2 --> H[HNSW Search] H --> R[Hybrid Reranking] R --> SA[SONA Adjustment + MicroLoRA] SA --> J[Structured JSON]`

`$3`

---

`Quick Start`

`$3`

- Node.js 18+

`$3`

`bash cd /path/to/your/magento2 # or Adobe Commerce project npx magector init`

This single command handles the entire setup:

`mermaid flowchart TD A["npx magector init"] --> B[Verify Project] B --> C[Download Model] C --> D[Index Codebase] D --> E[Detect IDE] E --> F[Write Config] F --> G[Update .gitignore]`

`$3`

`bash npx magector search "product price calculation" npx magector search "checkout totals collector" -l 20`

`$3`

`bash npx magector index`

`$3`

`bash npx magector setup`

---

`CLI Reference`

`$3`

`magector-core

#### index

`bash magector-core index [OPTIONS]

#### search

`bash magector-core search [OPTIONS]

Options: -d, --database Index database path [default: ./.magector/index.db] -l, --limit Number of results [default: 10] -f, --format Output format: text, json [default: text]`

#### describe

`bash magector-core describe [OPTIONS]

Options: -m, --magento-root Path to Magento root directory -o, --output Output SQLite database [default: ./.magector/sqlite.db] --force Re-describe all files (ignore cache)`

#### serve

`bash magector-core serve [OPTIONS]

Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.

Protocol (one JSON object per line):

`json // Request: {"command":"search","query":"product price","limit":10}

// Response: {"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}

// Stats request: {"command":"stats"}

// Watcher status: {"command":"watcher_status"} // Response: {"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}

// SONA feedback: {"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]} // Response: {"ok":true,"data":{"learned":1}}

// SONA status: {"command":"sona_status"} // Response: {"ok":true,"data":{"learned_patterns":5,"total_observations":12}}

`$3`

The describe command requires ANTHROPIC_API_KEY. After running describe, the next index automatically picks up the descriptions DB and embeds them into the vectors.

`$3`

---

`MCP Server Tools`

`$3`

All search tools return structured JSON:

`$3`

Each tool description includes "See also" hints to help AI clients chain tools effectively:

`$3`

---

`Supported Platforms`

Pre-built binaries are provided for the following platforms:

> Note: macOS Intel (x86_64) is not supported as a pre-built binary. Intel Mac users can build from source.

---

`Validation`

Magector is validated at two levels:

1. E2E MCP accuracy tests -- 101 queries across 16 tool categories via stdio JSON-RPC 2. Rust-level validation -- 557 test cases across 50+ categories against Magento 2.4.7

`$3`

`mermaid --- config: themeVariables: pie1: "#4caf50" pie2: "#f44336" --- pie title Test Pass Rate (101 queries) "Passed (101)" : 101 "Failed (0)" : 0`

`$3`

66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including magento_describe), analysis tools, and stdout JSON integrity.

`$3`

`bash

`E2E accuracy tests (101 queries, requires indexed codebase)`


npm run test:accuracy
npm run test:accuracy:verbose
Integration tests (66 tests)

npm test
SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)

npm run test:sona-eval
npm run test:sona-eval:verbose
Rust validation (557 test cases)

cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index


---
Project Structure

---

`How It Works`

`$3`

Magector scans every .php, .js, .xml, .phtml, and .graphqls file in a Magento 2 or Adobe Commerce codebase:

`$3`

style startup fill:#e8f4e8,color:#000 style query fill:#e8e8f4,color:#000 style fallback fill:#f4e8e8,color:#000`

`$3`

style W4 fill:#f4e8e8,color:#000 style W5 fill:#e8f4e8,color:#000 style W8 fill:#e8e8f4,color:#000`

`$3`

The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.

`mermaid sequenceDiagram participant Dev participant AI participant MCP participant Rust participant HNSW

`$3`

The MCP server tracks sequences of tool calls and sends feedback signals to the Rust process. Over time, this adjusts search result rankings based on observed usage patterns.

SONA v2: MicroLoRA + EWC++

SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weight Consolidation:

`bash cd rust-core && cargo build --release`

`$3`

Workflow:

`bash

`1. Generate descriptions (one-time, incremental — only re-describes changed files)`


ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento
2. Re-index with descriptions embedded into vectors

npx magector index /path/to/magento

Or via the MCP tool: magento_describe() generates descriptions and auto-reindexes affected files in one step.

Measured impact (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):

---

`Magento Patterns Detected`

Magector understands these Magento 2 architectural patterns:

---

`Configuration`

`$3`

Copy .cursorrules to your Magento project root for optimized AI-assisted development. The rules instruct the AI to:

1. Use Magector MCP tools before reading files manually 2. Write effective semantic queries 3. Follow Magento development patterns 4. Interpret search results correctly

`$3`

The ONNX model (all-MiniLM-L6-v2) is automatically downloaded on first run to ~/.magector/models/. To use a different location:

`bash magector-core index -m /path/to/magento -c /custom/model/path`

---

`Development`

`$3`

`bash git clone https://github.com/krejcif/magector.git cd magector

`Install Node.js dependencies`


npm install
Build the Rust core

cd rust-core
cargo build --release
cd ..
The CLI will automatically find the dev binary at rust-core/target/release/magector-core

node src/cli.js help

$3

`bash

`Rust core`


cd rust-core
cargo build --release
Run unit tests

cargo test
Run validation

cargo run --release -- validate

$3

`bash

`Integration tests (66 tests, requires indexed codebase)`


npm test
E2E accuracy tests (101 queries)

npm run test:accuracy
npm run test:accuracy:verbose
Run without index (unit + schema tests only)

npm run test:no-index
Rust unit tests (37 tests including SONA + descriptions)

cd rust-core && cargo test
SONA integration tests (8 tests)

node tests/mcp-sona.test.js
SONA/MicroLoRA benefit evaluation (180 queries)

npm run test:sona-eval
Rust validation (557 test cases)

cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index

$3

`bash cargo build --release ./target/release/magector-core validate -m ./magento2 --skip-index npm run test:accuracy`

`$3`

---

`Technical Details`

`$3`

- Model: all-MiniLM-L6-v2 - Dimensions: 384 - Pooling: Mean pooling with attention mask - Normalization: L2 normalized - Runtime: ONNX Runtime (viaort crate)

`$3`

Each indexed file produces a vector entry with metadata:

`$3`

---

`Roadmap`

---

`License`

MIT License. See LICENSE for details.

---

`Contributing`

Contributions are welcome. Please:

---

Built with Rust and Node.js for the Magento and Adobe Commerce community.