@cascadeflow/ml

ML-based semantic detection for cascadeflow TypeScript.

Brings TypeScript to feature parity with Python's ML capabilities using Transformers.js.

Features

- 🎯 84-87% domain detection confidence (matches Python)
- 🧠 Semantic validation using cosine similarity
- 🚀 Works everywhere - Node.js, browser, edge functions
- 📦 Same model as Python - BGE-small-en-v1.5
- 🔄 Automatic fallback to rule-based detection
- ⚡ Fast inference - ~20-50ms per embedding
- 🎨 Request-scoped caching - 50% latency reduction

Installation

``bash npm install @cascadeflow/ml`

The model (~40MB) will be downloaded automatically on first use.

`Usage`

`$3`

ML-based semantic detection is automatically available when @cascadeflow/ml is installed. The CascadeAgent will use it for enhanced domain detection and routing.

`typescript import { CascadeAgent } from '@cascadeflow/core';

const agent = new CascadeAgent({ models: [ { name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 }, { name: 'gpt-4o', provider: 'openai', cost: 0.00625 }, ], });

const result = await agent.run('Calculate eigenvalues of [[1,2],[3,4]]');

// ML detection results are in metadata when available console.log(result.metadata.domainDetected); // 'MATH' console.log(result.metadata.detectionMethod); // 'semantic' console.log(result.metadata.domainConfidence); // 0.87 (87%)`

`$3`

`typescript import { UnifiedEmbeddingService, EmbeddingCache } from '@cascadeflow/ml';

// Create service (lazy loads model) const embedder = new UnifiedEmbeddingService();

// Check availability if (await embedder.isAvailable()) { // Generate embeddings const embedding = await embedder.embed('Hello world'); console.log(embedding?.dimensions); // 384

// Compute similarity const similarity = await embedder.similarity('cat', 'kitten'); console.log(similarity); // ~0.85 (high similarity)

// Use caching for better performance const cache = new EmbeddingCache(embedder); const emb1 = await cache.getOrEmbed('query'); // Computes const emb2 = await cache.getOrEmbed('query'); // Cached! }`

`How It Works`

`$3`

Uses Xenova/bge-small-en-v1.5 (ONNX-converted BAAI/bge-small-en-v1.5): - Size: ~40MB - Dimensions: 384 - Inference: ~20-50ms per embedding - MTEB Score: 91.8% - Same as Python: Exact feature parity

`$3`

Computes semantic similarity between query and domain exemplars:

1. Embed user query → 384-dim vector 2. Compare to domain exemplars (8 per domain) 3. Find highest similarity score 4. Return domain with confidence

`$3`

If ML unavailable (model loading fails, dependency missing): - ✅ Automatically falls back to rule-based detection - ✅ All features continue to work - ✅ No errors or crashes - ⚠️ Slightly lower confidence (~60-75% vs 84-87%)

`Performance`

`$3`

- Cold start: ~200-500ms (model loading) - Warm: ~20-50ms per embedding - Cached: <1ms (request-scoped cache) - Batch: ~30% faster than individual calls

`$3`

Domain detection confidence: - ML semantic: 84-87% (complex domains) - Rule-based fallback: 60-75% - Improvement: 15-20% higher confidence

Tested on domains: MATH, CODE, DATA, STRUCTURED, REASONING

`Browser Support`

Works in modern browsers with: - WebAssembly support - Sufficient memory (~100MB for model) - ES2020+ JavaScript support

Tested on: - ✅ Chrome 90+ - ✅ Firefox 88+ - ✅ Safari 14+ - ✅ Edge 90+

`Edge Functions`

Supported edge runtimes: - ✅ Vercel Edge Functions - ✅ Cloudflare Workers - ✅ Netlify Edge Functions - ⚠️ AWS Lambda@Edge (check memory limits)

`API Reference`

`$3`

`typescript class UnifiedEmbeddingService { constructor(modelName?: string);

isAvailable(): Promise; embed(text: string): Promise; embedBatch(texts: string[]): Promise; similarity(text1: string, text2: string): Promise; }`

`$3`

`typescript class EmbeddingCache { constructor(embedder: UnifiedEmbeddingService);

getOrEmbed(text: string): Promise; similarity(text1: string, text2: string): Promise; clear(): void; cacheSize(): number; cacheInfo(): { size: number; texts: string[] }; }`

`$3`

`typescript interface EmbeddingVector { data: Float32Array; dimensions: number; }`

`Troubleshooting`

`$3`

`typescript // Check if ML is available const embedder = new UnifiedEmbeddingService(); const available = await embedder.isAvailable();

if (!available) { console.log('ML not available, using rule-based detection'); // App continues to work with fallback }`

`$3`

The model requires ~100MB memory. For constrained environments: - Use rule-based detection (no ML package) - Implement model lazy loading - Consider server-side ML service

`$3`

Model download (~40MB) happens once on first use. To preload:

`typescript const embedder = new UnifiedEmbeddingService(); await embedder.embed('warmup query'); // Triggers model download`

`Comparison with Python`

| Feature | Python | TypeScript | Notes | |---------|--------|------------|-------| | Model | FastEmbed | Transformers.js | Same BGE-small-en-v1.5 | | Confidence | 84-87% | 84-87% | ✅ Parity | | Latency | ~20-30ms | ~20-50ms | Similar | | Size | ~40MB | ~40MB | Same | | Fallback | ✅ | ✅ | Both graceful |

Result: Feature parity achieved! 🎉

`Examples`

See packages/core/examples/nodejs/production-patterns.ts` for a complete production example that demonstrates ML-based semantic detection and validation.

License

MIT

Support

- Documentation: https://github.com/lemony-ai/cascadeflow
- Issues: https://github.com/lemony-ai/cascadeflow/issues
- Discussions: https://github.com/lemony-ai/cascadeflow/discussions

@cascadeflow/ml

ML-based semantic detection for cascadeflow TypeScript.

Brings TypeScript to feature parity with Python's ML capabilities using Transformers.js.

Features

Installation

``bash npm install @cascadeflow/ml`

The model (~40MB) will be downloaded automatically on first use.

`Usage`

`$3`

ML-based semantic detection is automatically available when @cascadeflow/ml is installed. The CascadeAgent will use it for enhanced domain detection and routing.

`typescript import { CascadeAgent } from '@cascadeflow/core';

const agent = new CascadeAgent({ models: [ { name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 }, { name: 'gpt-4o', provider: 'openai', cost: 0.00625 }, ], });

const result = await agent.run('Calculate eigenvalues of [[1,2],[3,4]]');

`$3`

`typescript import { UnifiedEmbeddingService, EmbeddingCache } from '@cascadeflow/ml';

// Create service (lazy loads model) const embedder = new UnifiedEmbeddingService();

// Check availability if (await embedder.isAvailable()) { // Generate embeddings const embedding = await embedder.embed('Hello world'); console.log(embedding?.dimensions); // 384

// Compute similarity const similarity = await embedder.similarity('cat', 'kitten'); console.log(similarity); // ~0.85 (high similarity)

`How It Works`

`$3`

Uses Xenova/bge-small-en-v1.5 (ONNX-converted BAAI/bge-small-en-v1.5): - Size: ~40MB - Dimensions: 384 - Inference: ~20-50ms per embedding - MTEB Score: 91.8% - Same as Python: Exact feature parity

`$3`

Computes semantic similarity between query and domain exemplars:

1. Embed user query → 384-dim vector 2. Compare to domain exemplars (8 per domain) 3. Find highest similarity score 4. Return domain with confidence

`$3`

`Performance`

`$3`

- Cold start: ~200-500ms (model loading) - Warm: ~20-50ms per embedding - Cached: <1ms (request-scoped cache) - Batch: ~30% faster than individual calls

`$3`

Domain detection confidence: - ML semantic: 84-87% (complex domains) - Rule-based fallback: 60-75% - Improvement: 15-20% higher confidence

Tested on domains: MATH, CODE, DATA, STRUCTURED, REASONING

`Browser Support`

Works in modern browsers with: - WebAssembly support - Sufficient memory (~100MB for model) - ES2020+ JavaScript support

Tested on: - ✅ Chrome 90+ - ✅ Firefox 88+ - ✅ Safari 14+ - ✅ Edge 90+

`Edge Functions`

Supported edge runtimes: - ✅ Vercel Edge Functions - ✅ Cloudflare Workers - ✅ Netlify Edge Functions - ⚠️ AWS Lambda@Edge (check memory limits)

`API Reference`

`$3`

`typescript class UnifiedEmbeddingService { constructor(modelName?: string);

isAvailable(): Promise; embed(text: string): Promise; embedBatch(texts: string[]): Promise; similarity(text1: string, text2: string): Promise; }`

`$3`

`typescript class EmbeddingCache { constructor(embedder: UnifiedEmbeddingService);

getOrEmbed(text: string): Promise; similarity(text1: string, text2: string): Promise; clear(): void; cacheSize(): number; cacheInfo(): { size: number; texts: string[] }; }`

`$3`

`typescript interface EmbeddingVector { data: Float32Array; dimensions: number; }`

`Troubleshooting`

`$3`

`typescript // Check if ML is available const embedder = new UnifiedEmbeddingService(); const available = await embedder.isAvailable();

if (!available) { console.log('ML not available, using rule-based detection'); // App continues to work with fallback }`

`$3`

The model requires ~100MB memory. For constrained environments: - Use rule-based detection (no ML package) - Implement model lazy loading - Consider server-side ML service

`$3`

Model download (~40MB) happens once on first use. To preload:

`typescript const embedder = new UnifiedEmbeddingService(); await embedder.embed('warmup query'); // Triggers model download`

`Comparison with Python`

Result: Feature parity achieved! 🎉

`Examples`

See packages/core/examples/nodejs/production-patterns.ts` for a complete production example that demonstrates ML-based semantic detection and validation.

License

MIT

Support

- Documentation: https://github.com/lemony-ai/cascadeflow
- Issues: https://github.com/lemony-ai/cascadeflow/issues
- Discussions: https://github.com/lemony-ai/cascadeflow/discussions