Recursive Language Model library for handling arbitrarily long contexts
npm install kado-rlmA production-ready Node.js/TypeScript library implementing Recursive Language Models (RLMs) for handling arbitrarily long contexts. Based on the RLM research paper, this library enables LLMs to process inputs up to two orders of magnitude beyond their native context windows.
- RLM Orchestration: Treats long prompts as external environment data, allowing LLMs to programmatically examine, decompose, and recursively call themselves over context snippets
- Pluggable Tool System: Register any RAG, knowledge base, database, or API as callable functions
- Multi-Provider Support: OpenAI, Anthropic, and Google AI out of the box
- Secure Sandbox: V8 isolates for safe execution of LLM-generated code
- Full Observability: Prometheus metrics, Loki logging, and Tempo tracing via Grafana stack
- Built-in Benchmarking: Compare RLM performance against base LLM calls
- Production Ready: Circuit breakers, retry logic, rate limiting, and health checks
``bash`
npm install kado-rlmor
pnpm add kado-rlmor
yarn add kado-rlm
`bash`
git clone https://github.com/your-org/kado-rlm.git
cd kado-rlm
pnpm install
pnpm build
`typescript
import {
RLMOrchestrator,
ContextManager,
createLLMClient,
defineTools
} from 'kado-rlm';
// 1. Create an LLM client
const llmClient = createLLMClient('openai', { model: 'gpt-4o' });
// 2. Create a context manager with your long content
const contextManager = new ContextManager(yourLongDocument);
// 3. (Optional) Define custom tools for RAG, databases, etc.
const tools = defineTools([
{
name: 'search_docs',
description: 'Search the knowledge base for relevant information',
parameters: [
{ name: 'query', type: 'string', description: 'Search query', required: true },
{ name: 'limit', type: 'number', description: 'Max results', default: 5 },
],
handler: async (query: string, limit = 5) => {
return await yourVectorDB.search(query, { topK: limit });
},
},
]);
// 4. Create the orchestrator
const orchestrator = new RLMOrchestrator({
llmClient,
contextManager,
customTools: tools,
maxIterations: 20,
maxDepth: 2,
});
// 5. Run!
const result = await orchestrator.run('What are the key findings in this document?');
console.log(result.answer);
console.log(Completed in ${result.usage.iterations} iterations);`
`bashSet up environment
cp env.example .envEdit .env with your API keys
Then make HTTP requests:
`bash
curl -X POST http://localhost:3000/v1/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is the secret code mentioned in the text?",
"context": "... your long context here ...",
"provider": "openai",
"model": "gpt-4o"
}'
`Custom Tools
The pluggable tool system lets you register any external service as a function the LLM can call during reasoning.
$3
`typescript
import { defineTools } from 'kado-rlm';const tools = defineTools([
// RAG / Vector Search
{
name: 'rag_search',
description: 'Search the vector database for semantically similar documents',
parameters: [
{ name: 'query', type: 'string', description: 'Natural language search query', required: true },
{ name: 'topK', type: 'number', description: 'Number of results to return', default: 10 },
],
returns: 'Array of { content, score, metadata }',
handler: async (query: string, topK = 10) => {
const embedding = await embeddings.embed(query);
return await pinecone.query({ vector: embedding, topK });
},
},
// Database Lookup
{
name: 'get_customer',
description: 'Fetch customer details from the database',
parameters: [
{ name: 'customerId', type: 'string', description: 'Customer ID', required: true },
],
handler: async (customerId: string) => {
return await db.customers.findById(customerId);
},
},
// External API
{
name: 'check_weather',
description: 'Get current weather for a location',
parameters: [
{ name: 'city', type: 'string', description: 'City name', required: true },
],
handler: async (city: string) => {
const response = await fetch(
https://api.weather.com/v1/current?city=${city});
return response.json();
},
},
]);
`The LLM can then use these tools in its generated code:
`javascript
// LLM-generated sandbox code
const docs = await rag_search("authentication flow", 5);
const customer = await get_customer("cust_12345");for (const doc of docs) {
print(
Found: ${doc.content.slice(0, 100)}...);
}giveFinalAnswer({
message: "Based on the documentation and customer data...",
data: { sources: docs.map(d => d.metadata.source) }
});
`See the Tools Guide for detailed documentation on registering tools, and the RAG Integration Guide for RAG-specific patterns.
API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| POST |
/v1/completion | Run RLM completion with context |
| POST | /v1/chat | Direct LLM call (baseline comparison) |
| POST | /v1/benchmark | Start benchmark run |
| GET | /v1/benchmark/:id | Get benchmark results |
| GET | /v1/models | List available models |
| GET | /health | Liveness probe |
| GET | /ready | Readiness probe |
| GET | /metrics | Prometheus metrics |
| GET | /docs | Swagger UI documentation |Benchmarking
The built-in benchmark system compares RLM performance against direct LLM calls.
$3
`bash
curl -X POST http://localhost:3000/v1/benchmark \
-H "Content-Type: application/json" \
-d '{
"tasks": ["sniah", "multi-niah", "aggregation"],
"sizes": [8000, 16000, 32000, 64000],
"provider": "openai",
"model": "gpt-4o",
"runs": 3
}'
`$3
`bash
Run benchmark suite
pnpm benchmark --tasks sniah,aggregation --sizes 8000,16000,32000 --provider openai --model gpt-4oOutput as JSON
pnpm benchmark --output json > results.json
`$3
| Task | Description | Complexity |
|------|-------------|------------|
| sniah | Single needle-in-haystack | Constant |
| multi-niah | Multiple needles | Linear |
| aggregation | Count/sum across context | Linear |
| pairwise | Find matching pairs | Quadratic |
Observability
$3
`bash
cd docker
docker-compose up -dAccess services:
- Kado RLM: http://localhost:3000
- Grafana: http://localhost:3001 (admin/admin)
- Prometheus: http://localhost:9090
`$3
Key metrics exposed at
/metrics:-
rlm_request_duration_seconds - Request latency histogram
- rlm_iterations_total - RLM iteration count
- rlm_recursion_depth - Recursion depth distribution
- rlm_tokens_total - Token usage by type
- rlm_errors_total - Error counts by type
- rlm_circuit_breaker_state - Circuit breaker status$3
Structured JSON logs with correlation IDs. In development, pretty-printed via pino-pretty.
$3
OpenTelemetry traces exported to Tempo:
- Span per API request
- Child spans for LLM calls, sandbox executions, recursive sub-calls
- Automatic trace ID propagation
Configuration
Configure via environment variables (see
env.example):`bash
Server
PORT=3000
NODE_ENV=developmentLLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4oRLM Settings
MAX_ITERATIONS=20
MAX_RECURSION_DEPTH=3
SANDBOX_TIMEOUT_MS=5000
SANDBOX_MEMORY_MB=128Observability
METRICS_ENABLED=true
LOKI_ENABLED=false
TRACING_ENABLED=false
`Architecture
`
┌─────────────────────────────────────────────────────────┐
│ API Layer │
│ (Fastify + Rate Limiting + Auth + Swagger) │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ RLM Orchestrator │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Context │ │ Sandbox │ │ Custom │ │
│ │ Manager │ │ (V8 Isolate)│ │ Tools │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ LLM Providers │
│ ┌────────┐ ┌───────────┐ ┌────────┐ │
│ │ OpenAI │ │ Anthropic │ │ Google │ │
│ └────────┘ └───────────┘ └────────┘ │
└─────────────────────────────────────────────────────────┘
`Development
`bash
Install dependencies
pnpm installRun in development mode
pnpm devType check
pnpm typecheckRun tests
pnpm testRun tests with coverage
pnpm test:coverageBuild for production
pnpm buildStart production server
pnpm start
`Publishing to npm
$3
1. Create an npm account at npmjs.com
2. Login to npm:
`bash
npm login
`3. Update
package.json:
- Change name if kado-rlm is taken (e.g., @your-org/kado-rlm)
- Update repository.url to your actual repo
- Set author field$3
`bash
1. Make sure tests pass
pnpm test:run2. Build the package
pnpm build3. Verify what will be published
npm pack --dry-run4. Publish (first time)
npm publish5. For scoped packages (@your-org/kado-rlm)
npm publish --access public
`$3
`bash
Patch release (bug fixes): 0.1.0 → 0.1.1
npm version patchMinor release (new features): 0.1.0 → 0.2.0
npm version minorMajor release (breaking changes): 0.1.0 → 1.0.0
npm version majorThen publish
npm publish
`$3
Create
.github/workflows/publish.yml:`yaml
name: Publish to npmon:
release:
types: [created]
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v2
with:
version: 8
- uses: actions/setup-node@v4
with:
node-version: '20'
registry-url: 'https://registry.npmjs.org'
- run: pnpm install
- run: pnpm test:run
- run: pnpm build
- run: npm publish
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
`Add your npm token as a GitHub secret named
NPM_TOKEN.Production Deployment
$3
`bash
Build image
docker build -f docker/Dockerfile -t kado-rlm .Run container
docker run -p 3000:3000 \
-e OPENAI_API_KEY=sk-... \
-e NODE_ENV=production \
kado-rlm
`$3
-
/health - Basic liveness (process running)
- /ready - Readiness (providers configured, memory OK)Configure Kubernetes probes:
`yaml
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
``Automatic circuit breaking for LLM provider failures:
- Opens after 5 consecutive failures
- Half-open after 30s cooldown
- Per-provider tracking
Exponential backoff with jitter for transient errors:
- 3 retries by default
- Handles rate limits, timeouts, 5xx errors
| Resource | Default | Configurable |
|----------|---------|--------------|
| Max iterations | 20 | Yes |
| Max recursion depth | 3 | Yes |
| Sandbox CPU time | 5s | Yes |
| Sandbox memory | 128MB | Yes |
| Request timeout | 300s | Yes |
| Max context size | 10MB | Yes |
- Recursive Language Models (RLM) Paper
- RLM Research Paper PDF
- Technical Design Document
- Tools Guide — Detailed guide on registering custom tools
- RAG Integration Guide — Patterns for RAG integration
MIT