ML semantic detection for cascadeflow TypeScript - Feature parity with Python
npm install @cascadeflow/mlML-based semantic detection for cascadeflow TypeScript.
Brings TypeScript to feature parity with Python's ML capabilities using Transformers.js.
- 🎯 84-87% domain detection confidence (matches Python)
- 🧠 Semantic validation using cosine similarity
- 🚀 Works everywhere - Node.js, browser, edge functions
- 📦 Same model as Python - BGE-small-en-v1.5
- 🔄 Automatic fallback to rule-based detection
- ⚡ Fast inference - ~20-50ms per embedding
- 🎨 Request-scoped caching - 50% latency reduction
``bash`
npm install @cascadeflow/ml
The model (~40MB) will be downloaded automatically on first use.
ML-based semantic detection is automatically available when @cascadeflow/ml is installed. The CascadeAgent will use it for enhanced domain detection and routing.
`typescript
import { CascadeAgent } from '@cascadeflow/core';
const agent = new CascadeAgent({
models: [
{ name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
{ name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
],
});
const result = await agent.run('Calculate eigenvalues of [[1,2],[3,4]]');
// ML detection results are in metadata when available
console.log(result.metadata.domainDetected); // 'MATH'
console.log(result.metadata.detectionMethod); // 'semantic'
console.log(result.metadata.domainConfidence); // 0.87 (87%)
`
`typescript
import { UnifiedEmbeddingService, EmbeddingCache } from '@cascadeflow/ml';
// Create service (lazy loads model)
const embedder = new UnifiedEmbeddingService();
// Check availability
if (await embedder.isAvailable()) {
// Generate embeddings
const embedding = await embedder.embed('Hello world');
console.log(embedding?.dimensions); // 384
// Compute similarity
const similarity = await embedder.similarity('cat', 'kitten');
console.log(similarity); // ~0.85 (high similarity)
// Use caching for better performance
const cache = new EmbeddingCache(embedder);
const emb1 = await cache.getOrEmbed('query'); // Computes
const emb2 = await cache.getOrEmbed('query'); // Cached!
}
`
Uses Xenova/bge-small-en-v1.5 (ONNX-converted BAAI/bge-small-en-v1.5):
- Size: ~40MB
- Dimensions: 384
- Inference: ~20-50ms per embedding
- MTEB Score: 91.8%
- Same as Python: Exact feature parity
Computes semantic similarity between query and domain exemplars:
1. Embed user query → 384-dim vector
2. Compare to domain exemplars (8 per domain)
3. Find highest similarity score
4. Return domain with confidence
If ML unavailable (model loading fails, dependency missing):
- ✅ Automatically falls back to rule-based detection
- ✅ All features continue to work
- ✅ No errors or crashes
- ⚠️ Slightly lower confidence (~60-75% vs 84-87%)
- Cold start: ~200-500ms (model loading)
- Warm: ~20-50ms per embedding
- Cached: <1ms (request-scoped cache)
- Batch: ~30% faster than individual calls
Domain detection confidence:
- ML semantic: 84-87% (complex domains)
- Rule-based fallback: 60-75%
- Improvement: 15-20% higher confidence
Tested on domains: MATH, CODE, DATA, STRUCTURED, REASONING
Works in modern browsers with:
- WebAssembly support
- Sufficient memory (~100MB for model)
- ES2020+ JavaScript support
Tested on:
- ✅ Chrome 90+
- ✅ Firefox 88+
- ✅ Safari 14+
- ✅ Edge 90+
Supported edge runtimes:
- ✅ Vercel Edge Functions
- ✅ Cloudflare Workers
- ✅ Netlify Edge Functions
- ⚠️ AWS Lambda@Edge (check memory limits)
`typescript
class UnifiedEmbeddingService {
constructor(modelName?: string);
isAvailable(): Promise
embed(text: string): Promise
embedBatch(texts: string[]): Promise
similarity(text1: string, text2: string): Promise
}
`
`typescript
class EmbeddingCache {
constructor(embedder: UnifiedEmbeddingService);
getOrEmbed(text: string): Promise
similarity(text1: string, text2: string): Promise
clear(): void;
cacheSize(): number;
cacheInfo(): { size: number; texts: string[] };
}
`
`typescript`
interface EmbeddingVector {
data: Float32Array;
dimensions: number;
}
`typescript
// Check if ML is available
const embedder = new UnifiedEmbeddingService();
const available = await embedder.isAvailable();
if (!available) {
console.log('ML not available, using rule-based detection');
// App continues to work with fallback
}
`
The model requires ~100MB memory. For constrained environments:
- Use rule-based detection (no ML package)
- Implement model lazy loading
- Consider server-side ML service
Model download (~40MB) happens once on first use. To preload:
`typescript`
const embedder = new UnifiedEmbeddingService();
await embedder.embed('warmup query'); // Triggers model download
| Feature | Python | TypeScript | Notes |
|---------|--------|------------|-------|
| Model | FastEmbed | Transformers.js | Same BGE-small-en-v1.5 |
| Confidence | 84-87% | 84-87% | ✅ Parity |
| Latency | ~20-30ms | ~20-50ms | Similar |
| Size | ~40MB | ~40MB | Same |
| Fallback | ✅ | ✅ | Both graceful |
Result: Feature parity achieved! 🎉
See packages/core/examples/nodejs/production-patterns.ts` for a complete production example that demonstrates ML-based semantic detection and validation.
MIT
- Documentation: https://github.com/lemony-ai/cascadeflow
- Issues: https://github.com/lemony-ai/cascadeflow/issues
- Discussions: https://github.com/lemony-ai/cascadeflow/discussions