@ruvector/ruvllm

$3

100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning

![npm](https://www.npmjs.com/package/@ruvector/ruvllm)
![Downloads](https://www.npmjs.com/package/@ruvector/ruvllm)
![License](LICENSE)
![Tests](./test)

Quick Start | RLM | Training | Models | API

---

What is @ruvector/ruvllm?

@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:

- RLM (Recursive Language Model) - Break complex queries into sub-queries, synthesize coherent answers
- 100% Routing Accuracy - Hybrid keyword + embedding strategy for perfect agent selection
- SONA Self-Learning - Model improves with every successful interaction
- SIMD Acceleration - AVX2/NEON optimized inference

$3

| Challenge | Traditional Approach | @ruvector/ruvllm Solution |
|-----------|---------------------|---------------------------|
| Agent selection | Manual or keyword-based | Semantic + keyword hybrid = 100% |
| Complex queries | Single-shot RAG | Recursive decomposition + synthesis |
| Response latency | 2-5 seconds | <1ms cache, 50-200ms full |
| Learning | Static models | Self-improving (SONA) |
| Cost per route | $0.01+ (API call) | $0 (local inference) |

---

Installation

``bash npm install @ruvector/ruvllm`

`Quick Start`

`typescript import { RuvLLM, RlmController } from '@ruvector/ruvllm';

// Simple LLM inference const llm = new RuvLLM({ modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf', sonaEnabled: true, });

const response = await llm.query('Explain quantum computing'); console.log(response.text);

// Recursive Language Model for complex queries const rlm = new RlmController({ maxDepth: 5 }); const answer = await rlm.query('What are the causes AND solutions for slow API responses?'); // Automatically decomposes into sub-queries, retrieves context, synthesizes answer`

---

`Core Features`

`$3`

Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:

`typescript import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM({ model: 'ruv/ruvltra' });

// Intelligent routing const route = await llm.route('implement OAuth2 authentication'); console.log(route.agent); // 'security-architect' console.log(route.confidence); // 0.98 console.log(route.tier); // 2 (Haiku-level complexity)

// Multi-agent teams for complex tasks const team = await llm.routeComplex('build full-stack app with auth'); // Returns: [system-architect, backend-dev, coder, security-architect, tester]`

`$3`

`┌─────────────────────────────────────────────────────────┐ │ User Request │ └─────────────────────┬───────────────────────────────────┘ ↓ [RuvLTRA Routing] ↓ ┌─────────────┼─────────────┐ ↓ ↓ ↓ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ Tier 1 │ │ Tier 2 │ │ Tier 3 │ │ Booster │ │ Haiku │ │ Opus │ │ <1ms │ │ ~500ms │ │ 2-5s │ │ $0 │ │ $0.0002 │ │ $0.015 │ └───────────┘ └───────────┘ └───────────┘`

`$3`

Every successful interaction improves the model:

`typescript // First routing: Full inference llm.route('implement OAuth2') → security-architect (97%)

// Later: Pattern hit in <25μs (learned from success) llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)`

---

`RLM (Recursive Language Model)`

RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.

`$3`

`Query: "What are the causes AND solutions for slow API responses?" ↓ [Decomposition] / \ "Causes of slow API?" "Solutions for slow API?" ↓ ↓ [Sub-answers] [Sub-answers] \ / [Synthesis] ↓ Coherent combined answer with sources`

`$3`

`typescript import { RlmController } from '@ruvector/ruvllm';

const rlm = new RlmController({ maxDepth: 5, retrievalTopK: 10, enableCache: true, });

// Add knowledge to memory await rlm.addMemory('TypeScript adds static typing to JavaScript.'); await rlm.addMemory('React is a library for building user interfaces.');

// Query with recursive retrieval const answer = await rlm.query('What are causes and solutions for type errors in React?'); console.log(answer.text); // Comprehensive synthesized answer console.log(answer.sources); // Source attributions console.log(answer.qualityScore); // 0.0-1.0 console.log(answer.confidence); // Routing confidence`

`$3`

`typescript for await (const event of rlm.queryStream('Explain machine learning')) { if (event.type === 'token') { process.stdout.write(event.text); } else { console.log('\n\nQuality:', event.answer.qualityScore); } }`

`$3`

`typescript const rlm = new RlmController({ enableReflection: true, maxReflectionIterations: 2, minQualityScore: 0.8, });

// Answers are iteratively refined until quality >= 0.8 const answer = await rlm.query('Complex multi-part technical question...');`

`$3`

`typescript interface RlmConfig { maxDepth?: number; // Max recursion depth (default: 3) maxSubQueries?: number; // Max sub-queries per level (default: 5) tokenBudget?: number; // Token budget (default: 4096) enableCache?: boolean; // Enable caching (default: true) cacheTtl?: number; // Cache TTL in ms (default: 300000) retrievalTopK?: number; // Memory spans to retrieve (default: 10) minQualityScore?: number; // Min quality threshold (default: 0.7) enableReflection?: boolean; // Enable self-reflection (default: false) maxReflectionIterations?: number; // Max reflection loops (default: 2) }`

---

`Unique Capabilities`

`$3`

Every successful routing is stored in HNSW-indexed memory for instant recall:

`typescript // First time: Full inference (~50ms) route("implement OAuth2") → security-architect (97% confidence)

// Later: Memory hit (<25μs) route("add OAuth2 flow") → security-architect (99% confidence, cached)`

`$3`

`typescript // Low confidence automatically escalates Confidence > 0.9 → Use recommended agent Confidence 0.7-0.9 → Use with human confirmation Confidence < 0.7 → Escalate to higher tier`

`$3`

`typescript import { simd } from '@ruvector/ruvllm/simd';

// 4x faster vector operations with AVX2/NEON const similarity = simd.batchCosineSimilarity(query, targets); const attended = simd.flashAttention(q, k, v, scale);`

`$3`

Arc-based string interning for 100-1000x faster cache hits on large responses.

---

`Performance`

`$3`

| Operation | Latency | Throughput | |-----------|---------|------------| | Query decomposition | 340 ns | 2.9M/s | | Cache lookup | 23.5 ns | 42.5M/s | | Embedding (384d) | 293 ns | 3.4M/s | | Memory search (10k) | 0.4 ms | 2.5K/s | | End-to-end routing | <1 ms | 1K+/s | | Full RLM query | 50-200 ms | 5-20/s |

`$3`

| Strategy | RuvLTRA | Qwen Base | OpenAI | |----------|---------|-----------|--------| | Embedding Only | 45% | 40% | 52% | | Keyword Only | 78% | 78% | N/A | | Hybrid | 100% | 95% | N/A |

`$3`

`145 tests passing - RLM Controller: 24 tests - Routing Accuracy: 18 tests - Contrastive Training: 15 tests - SIMD Operations: 22 tests - SONA Learning: 19 tests - Memory/HNSW: 21 tests - Benchmarks: 26 tests`

---

`Models`

`$3`

URL: https://huggingface.co/ruv/ruvltra

`$3`

| Model | Size | Purpose | Accuracy | |-------|------|---------|----------| | ruvltra-claude-code-0.5b-q4_k_m | 398 MB | Agent routing | 100% (hybrid) | | ruvltra-small-0.5b-q4_k_m | ~400 MB | Embeddings | - | | ruvltra-medium-1.1b-q4_k_m | ~1 GB | Full inference | - |

`$3`

`typescript // Programmatic import { downloadModel } from '@ruvector/ruvllm'; await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });

// CLI ruvllm download ruv/ruvltra`

`$3`

Models are automatically downloaded on first use:

`typescript const llm = new RuvLLM({ model: 'ruv/ruvltra' }); // Downloads to ~/.ruvllm/models/ if not present`

---

`Training`

`$3`

`bash node scripts/training/routing-dataset.js

`Output: 381 examples, 793 contrastive pairs, 156 hard negatives`

$3

`typescript import { ContrastiveTrainer } from '@ruvector/ruvllm';

const trainer = new ContrastiveTrainer({ modelPath: './models/base.gguf', loraRank: 8, loraAlpha: 16, learningRate: 1e-4, });

const pairs = [ { anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' }, // ... more pairs ];

await trainer.train(pairs, { epochs: 10 }); await trainer.save('./adapters/routing-lora');`

`$3`

| Script | Description | |--------|-------------| |routing-dataset.js| Generate 381 routing examples | |claude-code-synth.js| Synthetic data generation | |contrastive-finetune.js| LoRA fine-tuning pipeline | |rlm-dataset.js | RLM training data (500 examples) |

---

`API Reference`

`$3`

`typescript class RuvLLM { constructor(config?: RuvLLMConfig);

query(prompt: string, params?: GenerateParams): Promise; stream(prompt: string, params?: GenerateParams): AsyncIterable; route(task: string): Promise; routeComplex(task: string): Promise;

loadModel(path: string): Promise; addMemory(text: string, metadata?: object): number; searchMemory(query: string, topK?: number): MemoryResult[];

sonaStats(): SonaStats | null; adapt(input: Float32Array, quality: number): void; }`

`$3`

`typescript class RlmController { constructor(config?: RlmConfig, engine?: RuvLLM);

query(input: string): Promise; queryStream(input: string): AsyncGenerator;

addMemory(text: string, metadata?: object): Promise; searchMemory(query: string, topK?: number): Promise;

clearCache(): void; getCacheStats(): { size: number; entries: number };

updateConfig(config: Partial): void; getConfig(): Required; }`

`$3`

`typescript import { // Core RuvLLM, RuvLLMConfig,

// RLM RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,

// Training RlmTrainer, ContrastiveTrainer, createRlmTrainer, DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,

// SONA Learning SonaCoordinator, TrajectoryBuilder,

// LoRA LoraAdapter, LoraManager,

// Benchmarks ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark, } from '@ruvector/ruvllm';`

---

`CLI`

`bash

`Route a task`


ruvllm route "add unit tests for auth module"
→ Agent: tester | Confidence: 0.96 | Tier: 2
Query with streaming

ruvllm query --stream "Explain machine learning"
Download models

ruvllm download ruv/ruvltra
Run benchmarks

ruvllm bench ./models/model.gguf
Evaluate (SWE-Bench)

ruvllm eval --model ./models/model.gguf --subset lite

---

Platform Support

| Platform | Architecture | Status |
|----------|--------------|--------|
| macOS | arm64 (M1-M4) | Full support |
| macOS | x64 | Supported |
| Linux | x64 | Supported |
| Linux | arm64 | Supported |
| Windows | x64 | Supported |

---

Links

| Resource | URL |
|----------|-----|
| npm | npmjs.com/package/@ruvector/ruvllm |
| HuggingFace | huggingface.co/ruv/ruvltra |
| Crate (Rust) | crates.io/crates/ruvllm |
| Documentation | docs.rs/ruvllm |
| GitHub | github.com/ruvnet/ruvector |
| Claude Flow | github.com/ruvnet/claude-flow |

---

License

MIT OR Apache-2.0

---

Built for Claude Code. Optimized for agents. Designed for speed.

Get Started | View on GitHub

@ruvector/ruvllm

$3

100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning

![npm](https://www.npmjs.com/package/@ruvector/ruvllm)
![Downloads](https://www.npmjs.com/package/@ruvector/ruvllm)
![License](LICENSE)
![Tests](./test)

Quick Start | RLM | Training | Models | API

---

What is @ruvector/ruvllm?

@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:

$3

---

Installation

``bash npm install @ruvector/ruvllm`

`Quick Start`

`typescript import { RuvLLM, RlmController } from '@ruvector/ruvllm';

// Simple LLM inference const llm = new RuvLLM({ modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf', sonaEnabled: true, });

const response = await llm.query('Explain quantum computing'); console.log(response.text);

---

`Core Features`

`$3`

Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:

`typescript import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM({ model: 'ruv/ruvltra' });

// Multi-agent teams for complex tasks const team = await llm.routeComplex('build full-stack app with auth'); // Returns: [system-architect, backend-dev, coder, security-architect, tester]`

`$3`

Every successful interaction improves the model:

`typescript // First routing: Full inference llm.route('implement OAuth2') → security-architect (97%)

// Later: Pattern hit in <25μs (learned from success) llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)`

---

`RLM (Recursive Language Model)`

RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.

`$3`

`typescript import { RlmController } from '@ruvector/ruvllm';

const rlm = new RlmController({ maxDepth: 5, retrievalTopK: 10, enableCache: true, });

// Add knowledge to memory await rlm.addMemory('TypeScript adds static typing to JavaScript.'); await rlm.addMemory('React is a library for building user interfaces.');

`$3`

`typescript const rlm = new RlmController({ enableReflection: true, maxReflectionIterations: 2, minQualityScore: 0.8, });

// Answers are iteratively refined until quality >= 0.8 const answer = await rlm.query('Complex multi-part technical question...');`

`$3`

---

`Unique Capabilities`

`$3`

Every successful routing is stored in HNSW-indexed memory for instant recall:

`typescript // First time: Full inference (~50ms) route("implement OAuth2") → security-architect (97% confidence)

// Later: Memory hit (<25μs) route("add OAuth2 flow") → security-architect (99% confidence, cached)`

`$3`

`typescript // Low confidence automatically escalates Confidence > 0.9 → Use recommended agent Confidence 0.7-0.9 → Use with human confirmation Confidence < 0.7 → Escalate to higher tier`

`$3`

`typescript import { simd } from '@ruvector/ruvllm/simd';

// 4x faster vector operations with AVX2/NEON const similarity = simd.batchCosineSimilarity(query, targets); const attended = simd.flashAttention(q, k, v, scale);`

`$3`

Arc-based string interning for 100-1000x faster cache hits on large responses.

---

`Performance`

`$3`

| Strategy | RuvLTRA | Qwen Base | OpenAI | |----------|---------|-----------|--------| | Embedding Only | 45% | 40% | 52% | | Keyword Only | 78% | 78% | N/A | | Hybrid | 100% | 95% | N/A |

`$3`

---

`Models`

`$3`

URL: https://huggingface.co/ruv/ruvltra

`$3`

`typescript // Programmatic import { downloadModel } from '@ruvector/ruvllm'; await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });

// CLI ruvllm download ruv/ruvltra`

`$3`

Models are automatically downloaded on first use:

`typescript const llm = new RuvLLM({ model: 'ruv/ruvltra' }); // Downloads to ~/.ruvllm/models/ if not present`

---

`Training`

`$3`

`bash node scripts/training/routing-dataset.js

`Output: 381 examples, 793 contrastive pairs, 156 hard negatives`

$3

`typescript import { ContrastiveTrainer } from '@ruvector/ruvllm';

const trainer = new ContrastiveTrainer({ modelPath: './models/base.gguf', loraRank: 8, loraAlpha: 16, learningRate: 1e-4, });

const pairs = [ { anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' }, // ... more pairs ];

await trainer.train(pairs, { epochs: 10 }); await trainer.save('./adapters/routing-lora');`

`$3`

---

`API Reference`

`$3`

`typescript class RuvLLM { constructor(config?: RuvLLMConfig);

query(prompt: string, params?: GenerateParams): Promise; stream(prompt: string, params?: GenerateParams): AsyncIterable; route(task: string): Promise; routeComplex(task: string): Promise;

loadModel(path: string): Promise; addMemory(text: string, metadata?: object): number; searchMemory(query: string, topK?: number): MemoryResult[];

sonaStats(): SonaStats | null; adapt(input: Float32Array, quality: number): void; }`

`$3`

`typescript class RlmController { constructor(config?: RlmConfig, engine?: RuvLLM);

query(input: string): Promise; queryStream(input: string): AsyncGenerator;

addMemory(text: string, metadata?: object): Promise; searchMemory(query: string, topK?: number): Promise;

clearCache(): void; getCacheStats(): { size: number; entries: number };

updateConfig(config: Partial): void; getConfig(): Required; }`

`$3`

`typescript import { // Core RuvLLM, RuvLLMConfig,

// RLM RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,

// Training RlmTrainer, ContrastiveTrainer, createRlmTrainer, DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,

// SONA Learning SonaCoordinator, TrajectoryBuilder,

// LoRA LoraAdapter, LoraManager,

// Benchmarks ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark, } from '@ruvector/ruvllm';`

---

`CLI`

`bash

`Route a task`


ruvllm route "add unit tests for auth module"
→ Agent: tester | Confidence: 0.96 | Tier: 2
Query with streaming

ruvllm query --stream "Explain machine learning"
Download models

ruvllm download ruv/ruvltra
Run benchmarks

ruvllm bench ./models/model.gguf
Evaluate (SWE-Bench)

ruvllm eval --model ./models/model.gguf --subset lite

---

Platform Support

---

Links

---

License

MIT OR Apache-2.0

---

Built for Claude Code. Optimized for agents. Designed for speed.

Get Started | View on GitHub