Kado RLM - Recursive Language Model Library

A production-ready Node.js/TypeScript library implementing Recursive Language Models (RLMs) for handling arbitrarily long contexts. Based on the RLM research paper, this library enables LLMs to process inputs up to two orders of magnitude beyond their native context windows.

Features

- RLM Orchestration: Treats long prompts as external environment data, allowing LLMs to programmatically examine, decompose, and recursively call themselves over context snippets
- Pluggable Tool System: Register any RAG, knowledge base, database, or API as callable functions
- Multi-Provider Support: OpenAI, Anthropic, and Google AI out of the box
- Secure Sandbox: V8 isolates for safe execution of LLM-generated code
- Full Observability: Prometheus metrics, Loki logging, and Tempo tracing via Grafana stack
- Built-in Benchmarking: Compare RLM performance against base LLM calls
- Production Ready: Circuit breakers, retry logic, rate limiting, and health checks

Installation

$3

``bash npm install kado-rlm

`or`


pnpm add kado-rlm
or

yarn add kado-rlm

$3

`bash git clone https://github.com/your-org/kado-rlm.git cd kado-rlm pnpm install pnpm build`

`Quick Start`

`$3`

`typescript import { RLMOrchestrator, ContextManager, createLLMClient, defineTools } from 'kado-rlm';

// 1. Create an LLM client const llmClient = createLLMClient('openai', { model: 'gpt-4o' });

// 2. Create a context manager with your long content const contextManager = new ContextManager(yourLongDocument);

// 3. (Optional) Define custom tools for RAG, databases, etc. const tools = defineTools([ { name: 'search_docs', description: 'Search the knowledge base for relevant information', parameters: [ { name: 'query', type: 'string', description: 'Search query', required: true }, { name: 'limit', type: 'number', description: 'Max results', default: 5 }, ], handler: async (query: string, limit = 5) => { return await yourVectorDB.search(query, { topK: limit }); }, }, ]);

// 4. Create the orchestrator const orchestrator = new RLMOrchestrator({ llmClient, contextManager, customTools: tools, maxIterations: 20, maxDepth: 2, });

// 5. Run! const result = await orchestrator.run('What are the key findings in this document?');

console.log(result.answer); console.log(Completed in ${result.usage.iterations} iterations);`

`$3`

`bash

`Set up environment`


cp env.example .env
Edit .env with your API keys
Start development server

pnpm dev
Or production

pnpm build
pnpm start


Then make HTTP requests:

`bash curl -X POST http://localhost:3000/v1/completion \ -H "Content-Type: application/json" \ -d '{ "prompt": "What is the secret code mentioned in the text?", "context": "... your long context here ...", "provider": "openai", "model": "gpt-4o" }'`

`Custom Tools`

The pluggable tool system lets you register any external service as a function the LLM can call during reasoning.

`$3`

`typescript import { defineTools } from 'kado-rlm';

const tools = defineTools([ // RAG / Vector Search { name: 'rag_search', description: 'Search the vector database for semantically similar documents', parameters: [ { name: 'query', type: 'string', description: 'Natural language search query', required: true }, { name: 'topK', type: 'number', description: 'Number of results to return', default: 10 }, ], returns: 'Array of { content, score, metadata }', handler: async (query: string, topK = 10) => { const embedding = await embeddings.embed(query); return await pinecone.query({ vector: embedding, topK }); }, },

// Database Lookup { name: 'get_customer', description: 'Fetch customer details from the database', parameters: [ { name: 'customerId', type: 'string', description: 'Customer ID', required: true }, ], handler: async (customerId: string) => { return await db.customers.findById(customerId); }, },

// External API { name: 'check_weather', description: 'Get current weather for a location', parameters: [ { name: 'city', type: 'string', description: 'City name', required: true }, ], handler: async (city: string) => { const response = await fetch(https://api.weather.com/v1/current?city=${city}); return response.json(); }, }, ]);`

The LLM can then use these tools in its generated code:

`javascript // LLM-generated sandbox code const docs = await rag_search("authentication flow", 5); const customer = await get_customer("cust_12345");

for (const doc of docs) { print(Found: ${doc.content.slice(0, 100)}...); }

giveFinalAnswer({ message: "Based on the documentation and customer data...", data: { sources: docs.map(d => d.metadata.source) } });`

See the Tools Guide for detailed documentation on registering tools, and the RAG Integration Guide for RAG-specific patterns.

`API Endpoints`

| Method | Path | Description | |--------|------|-------------| | POST |/v1/completion| Run RLM completion with context | | POST |/v1/chat| Direct LLM call (baseline comparison) | | POST |/v1/benchmark| Start benchmark run | | GET |/v1/benchmark/:id| Get benchmark results | | GET |/v1/models| List available models | | GET |/health| Liveness probe | | GET |/ready| Readiness probe | | GET |/metrics| Prometheus metrics | | GET |/docs | Swagger UI documentation |

`Benchmarking`

The built-in benchmark system compares RLM performance against direct LLM calls.

`$3`

`bash curl -X POST http://localhost:3000/v1/benchmark \ -H "Content-Type: application/json" \ -d '{ "tasks": ["sniah", "multi-niah", "aggregation"], "sizes": [8000, 16000, 32000, 64000], "provider": "openai", "model": "gpt-4o", "runs": 3 }'`

`$3`

`bash

`Run benchmark suite`


pnpm benchmark --tasks sniah,aggregation --sizes 8000,16000,32000 --provider openai --model gpt-4o
Output as JSON

pnpm benchmark --output json > results.json


$3
| Task | Description | Complexity |
|------|-------------|------------|
| sniah | Single needle-in-haystack | Constant |
| multi-niah | Multiple needles | Linear |
| aggregation | Count/sum across context | Linear |
| pairwise | Find matching pairs | Quadratic |
Observability
$3

`bash cd docker docker-compose up -d

`Access services:`


- Kado RLM: http://localhost:3000

- Grafana: http://localhost:3001 (admin/admin)

- Prometheus: http://localhost:9090

$3

Key metrics exposed at /metrics:

- rlm_request_duration_seconds- Request latency histogram -rlm_iterations_total- RLM iteration count -rlm_recursion_depth- Recursion depth distribution -rlm_tokens_total- Token usage by type -rlm_errors_total- Error counts by type -rlm_circuit_breaker_state - Circuit breaker status

`$3`

Structured JSON logs with correlation IDs. In development, pretty-printed via pino-pretty.

`$3`

OpenTelemetry traces exported to Tempo:

- Span per API request - Child spans for LLM calls, sandbox executions, recursive sub-calls - Automatic trace ID propagation

`Configuration`

Configure via environment variables (see env.example):

`bash

`Server`


PORT=3000
NODE_ENV=development
LLM Providers

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4o
RLM Settings

MAX_ITERATIONS=20
MAX_RECURSION_DEPTH=3
SANDBOX_TIMEOUT_MS=5000
SANDBOX_MEMORY_MB=128
Observability

METRICS_ENABLED=true
LOKI_ENABLED=false
TRACING_ENABLED=false


Architecture

`┌─────────────────────────────────────────────────────────┐ │ API Layer │ │ (Fastify + Rate Limiting + Auth + Swagger) │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ RLM Orchestrator │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Context │ │ Sandbox │ │ Custom │ │ │ │ Manager │ │ (V8 Isolate)│ │ Tools │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ LLM Providers │ │ ┌────────┐ ┌───────────┐ ┌────────┐ │ │ │ OpenAI │ │ Anthropic │ │ Google │ │ │ └────────┘ └───────────┘ └────────┘ │ └─────────────────────────────────────────────────────────┘`

`Development`

`bash

`Install dependencies`


pnpm install
Run in development mode

pnpm dev
Type check

pnpm typecheck
Run tests

pnpm test
Run tests with coverage

pnpm test:coverage
Build for production

pnpm build
Start production server

pnpm start


Publishing to npm
$3
1. Create an npm account at npmjs.com

2. Login to npm:`bash npm login`

3. Update package.json: - Changename if kado-rlm is taken (e.g., @your-org/kado-rlm) - Updaterepository.urlto your actual repo - Setauthor field

`$3`

`bash

`1. Make sure tests pass`


pnpm test:run
2. Build the package

pnpm build
3. Verify what will be published

npm pack --dry-run
4. Publish (first time)

npm publish
5. For scoped packages (@your-org/kado-rlm)

npm publish --access public

$3

`bash

`Patch release (bug fixes): 0.1.0 → 0.1.1`


npm version patch
Minor release (new features): 0.1.0 → 0.2.0

npm version minor
Major release (breaking changes): 0.1.0 → 1.0.0

npm version major
Then publish

npm publish

$3

Create .github/workflows/publish.yml:

`yaml name: Publish to npm

on: release: types: [created]

jobs: publish: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: pnpm/action-setup@v2 with: version: 8 - uses: actions/setup-node@v4 with: node-version: '20' registry-url: 'https://registry.npmjs.org' - run: pnpm install - run: pnpm test:run - run: pnpm build - run: npm publish env: NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}`

Add your npm token as a GitHub secret named NPM_TOKEN.

`Production Deployment`

`$3`

`bash

`Build image`


docker build -f docker/Dockerfile -t kado-rlm .
Run container

docker run -p 3000:3000 \
  -e OPENAI_API_KEY=sk-... \
  -e NODE_ENV=production \
  kado-rlm

$3

- /health- Basic liveness (process running) -/ready - Readiness (providers configured, memory OK)

Configure Kubernetes probes:

`yaml livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 10

readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 10 periodSeconds: 5``

Stability Features

$3

Automatic circuit breaking for LLM provider failures:
- Opens after 5 consecutive failures
- Half-open after 30s cooldown
- Per-provider tracking

$3

Exponential backoff with jitter for transient errors:
- 3 retries by default
- Handles rate limits, timeouts, 5xx errors

$3

| Resource | Default | Configurable |
|----------|---------|--------------|
| Max iterations | 20 | Yes |
| Max recursion depth | 3 | Yes |
| Sandbox CPU time | 5s | Yes |
| Sandbox memory | 128MB | Yes |
| Request timeout | 300s | Yes |
| Max context size | 10MB | Yes |

References

- Recursive Language Models (RLM) Paper
- RLM Research Paper PDF
- Technical Design Document
- Tools Guide — Detailed guide on registering custom tools
- RAG Integration Guide — Patterns for RAG integration

License

MIT

Kado RLM - Recursive Language Model Library

Features

Installation

$3

``bash npm install kado-rlm

`or`


pnpm add kado-rlm
or

yarn add kado-rlm

$3

`bash git clone https://github.com/your-org/kado-rlm.git cd kado-rlm pnpm install pnpm build`

`Quick Start`

`$3`

`typescript import { RLMOrchestrator, ContextManager, createLLMClient, defineTools } from 'kado-rlm';

// 1. Create an LLM client const llmClient = createLLMClient('openai', { model: 'gpt-4o' });

// 2. Create a context manager with your long content const contextManager = new ContextManager(yourLongDocument);

// 4. Create the orchestrator const orchestrator = new RLMOrchestrator({ llmClient, contextManager, customTools: tools, maxIterations: 20, maxDepth: 2, });

// 5. Run! const result = await orchestrator.run('What are the key findings in this document?');

console.log(result.answer); console.log(Completed in ${result.usage.iterations} iterations);`

`$3`

`bash

`Set up environment`


cp env.example .env
Edit .env with your API keys
Start development server

pnpm dev
Or production

pnpm build
pnpm start


Then make HTTP requests:

`Custom Tools`

The pluggable tool system lets you register any external service as a function the LLM can call during reasoning.

`$3`

`typescript import { defineTools } from 'kado-rlm';

The LLM can then use these tools in its generated code:

`javascript // LLM-generated sandbox code const docs = await rag_search("authentication flow", 5); const customer = await get_customer("cust_12345");

for (const doc of docs) { print(Found: ${doc.content.slice(0, 100)}...); }

giveFinalAnswer({ message: "Based on the documentation and customer data...", data: { sources: docs.map(d => d.metadata.source) } });`

See the Tools Guide for detailed documentation on registering tools, and the RAG Integration Guide for RAG-specific patterns.

`API Endpoints`

`Benchmarking`

The built-in benchmark system compares RLM performance against direct LLM calls.

`$3`

`bash

`Run benchmark suite`


pnpm benchmark --tasks sniah,aggregation --sizes 8000,16000,32000 --provider openai --model gpt-4o
Output as JSON

pnpm benchmark --output json > results.json


$3
| Task | Description | Complexity |
|------|-------------|------------|
| sniah | Single needle-in-haystack | Constant |
| multi-niah | Multiple needles | Linear |
| aggregation | Count/sum across context | Linear |
| pairwise | Find matching pairs | Quadratic |
Observability
$3

`bash cd docker docker-compose up -d

`Access services:`


- Kado RLM: http://localhost:3000

- Grafana: http://localhost:3001 (admin/admin)

- Prometheus: http://localhost:9090

$3

Key metrics exposed at /metrics:

`$3`

Structured JSON logs with correlation IDs. In development, pretty-printed via pino-pretty.

`$3`

OpenTelemetry traces exported to Tempo:

- Span per API request - Child spans for LLM calls, sandbox executions, recursive sub-calls - Automatic trace ID propagation

`Configuration`

Configure via environment variables (see env.example):

`bash

`Server`


PORT=3000
NODE_ENV=development
LLM Providers

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4o
RLM Settings

MAX_ITERATIONS=20
MAX_RECURSION_DEPTH=3
SANDBOX_TIMEOUT_MS=5000
SANDBOX_MEMORY_MB=128
Observability

METRICS_ENABLED=true
LOKI_ENABLED=false
TRACING_ENABLED=false


Architecture

`Development`

`bash

`Install dependencies`


pnpm install
Run in development mode

pnpm dev
Type check

pnpm typecheck
Run tests

pnpm test
Run tests with coverage

pnpm test:coverage
Build for production

pnpm build
Start production server

pnpm start


Publishing to npm
$3
1. Create an npm account at npmjs.com

2. Login to npm:`bash npm login`

3. Update package.json: - Changename if kado-rlm is taken (e.g., @your-org/kado-rlm) - Updaterepository.urlto your actual repo - Setauthor field

`$3`

`bash

`1. Make sure tests pass`


pnpm test:run
2. Build the package

pnpm build
3. Verify what will be published

npm pack --dry-run
4. Publish (first time)

npm publish
5. For scoped packages (@your-org/kado-rlm)

npm publish --access public

$3

`bash

`Patch release (bug fixes): 0.1.0 → 0.1.1`


npm version patch
Minor release (new features): 0.1.0 → 0.2.0

npm version minor
Major release (breaking changes): 0.1.0 → 1.0.0

npm version major
Then publish

npm publish

$3

Create .github/workflows/publish.yml:

`yaml name: Publish to npm

on: release: types: [created]

Add your npm token as a GitHub secret named NPM_TOKEN.

`Production Deployment`

`$3`

`bash

`Build image`


docker build -f docker/Dockerfile -t kado-rlm .
Run container

docker run -p 3000:3000 \
  -e OPENAI_API_KEY=sk-... \
  -e NODE_ENV=production \
  kado-rlm

$3

- /health- Basic liveness (process running) -/ready - Readiness (providers configured, memory OK)

Configure Kubernetes probes:

`yaml livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 10

readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 10 periodSeconds: 5``

Stability Features

$3

Automatic circuit breaking for LLM provider failures:
- Opens after 5 consecutive failures
- Half-open after 30s cooldown
- Per-provider tracking

$3

Exponential backoff with jitter for transient errors:
- 3 retries by default
- Handles rate limits, timeouts, 5xx errors

$3

References

License

MIT