Infinite context windows for Claude via OpenMemory semantic retrieval
npm install infinite-memory> Infinite context windows for Claude via OpenMemory semantic retrieval
Drop-in replacement for Anthropic's AI SDK provider that automatically manages infinite conversation context using OpenMemory for semantic storage and retrieval.
- šÆ Truly infinite context - Never lose conversation history, no matter how long
- š§ Smart retrieval - Semantic search finds relevant context from thousands of messages
- š Transparent operation - Drop-in replacement for @ai-sdk/anthropic
- ā” Token-aware - Automatically fits context under model limits (200k for Sonnet 4)
- š¾ Automatic storage - Messages stored in OpenMemory with zero configuration
- š”ļø Resilient - Falls back to recent messages if OpenMemory is unavailable
- š§ Zero config - Just provide conversationId and userId
``bash`
npm install infinite-memory
You need an OpenMemory server running. See OpenMemory Quick Start for setup.
`typescript
import { createInfiniteMemory } from 'infinite-memory';
import { streamText } from 'ai';
// Create the infinite memory provider
const memory = createInfiniteMemory({
openMemoryUrl: 'http://localhost:8080',
openMemoryApiKey: process.env.OPENMEMORY_API_KEY!,
anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
});
// Create a model with conversation context
const model = memory('claude-sonnet-4', {
conversationId: 'conv_123',
userId: 'user_456'
});
// Use it like any AI SDK model - infinite memory happens automatically
const result = await streamText({
model,
messages: [
{ role: 'user', content: 'What did we discuss 100 messages ago?' }
],
});
// Stream the response
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
`
That's it! The model will:
1. Query OpenMemory for relevant historical context
2. Combine with recent messages
3. Stay under token budget
4. Store the conversation automatically
For each request, Infinite Memory:
1. Always includes the last 3-5 messages (chronological context)
2. Queries OpenMemory for semantically relevant older messages
3. Scores and ranks by relevance + recency
4. Fills token budget (50% of model limit, e.g., 100k for Sonnet 4)
5. Deduplicates to avoid sending messages twice
After each request:
- User message ā Stored with full JSON structure
- Assistant response ā Stored after completion (streaming supported)
- Metadata: conversationId, userId, role, timestamp
If OpenMemory is slow or unavailable:
- Falls back to recent messages only
- Ensures messages fit under context window
- Chat continues without interruption
Creates an infinite memory provider.
`typescript`
const memory = createInfiniteMemory({
openMemoryUrl: string; // OpenMemory server URL
openMemoryApiKey: string; // OpenMemory API key
anthropicApiKey: string; // Anthropic API key
openMemoryTimeout?: number; // Query timeout in ms (default: 2000)
});
Returns a model creator function: (modelId, context) => LanguageModel
`typescript`
const model = memory(modelId, context);
Parameters:
- modelId: string - Claude model ID (e.g., 'claude-sonnet-4')context: ModelContext
- - Conversation scopeconversationId: string
- - Unique conversation identifieruserId: string
- - User identifier for scoping
Returns: LanguageModel - Compatible with all AI SDK functions
- claude-sonnet-4 / claude-sonnet-4-20250514 (200k context)claude-opus-4
- / claude-opus-4-20250514 (200k context)claude-haiku-3-5
- / claude-haiku-3-5-20250514 (100k context)
`typescript
import { streamText } from 'ai';
const model = memory('claude-sonnet-4', {
conversationId: 'conv_123',
userId: 'user_456'
});
const result = await streamText({
model,
messages: [{ role: 'user', content: 'Hello!' }],
});
for await (const chunk of result.textStream) {
console.log(chunk);
}
`
`typescript
import { generateText, tool } from 'ai';
import { z } from 'zod';
const result = await generateText({
model: memory('claude-sonnet-4', { conversationId, userId }),
messages,
tools: {
getWeather: tool({
description: 'Get weather for a location',
parameters: z.object({
location: z.string(),
}),
execute: async ({ location }) => {
return { temperature: 72, condition: 'sunny' };
},
}),
},
});
`
`typescript
import express from 'express';
import { createInfiniteMemory } from 'infinite-memory';
import { streamText } from 'ai';
const app = express();
const memory = createInfiniteMemory({ / config / });
app.post('/api/chat', async (req, res) => {
const { messages, conversationId, userId } = req.body;
const model = memory('claude-sonnet-4', {
conversationId,
userId
});
const result = await streamText({ model, messages });
// Stream response back to client
result.pipeDataStreamToResponse(res);
});
`
By default, Infinite Memory reserves 50% of the model's context window for input:
- Sonnet 4: 100k tokens for context
- Opus 4: 100k tokens for context
- Haiku 3.5: 50k tokens for context
This leaves room for output and system prompts.
Queries timeout after 2 seconds by default. Adjust if needed:
`typescript`
const memory = createInfiniteMemory({
// ...
openMemoryTimeout: 5000, // 5 seconds
});
``
Client Request
ā
InfiniteMemoryModel.doStream()
ā
ContextManager.getRelevantContext()
āāā Get last 3-5 messages (recent)
āāā Query OpenMemory (semantic search)
āāā Merge + deduplicate (under token budget)
ā
Anthropic API (with augmented context)
ā
Stream Response
ā
Store in OpenMemory (after completion)
- OpenMemory queries: ~50-200ms (localhost)
- Fallback mode: Instant (recent messages only)
- Storage: Async, non-blocking
- Memory overhead: Minimal (~10MB per conversation)
Enable verbose logging by checking console output:
```
⨠[InfiniteMemory] Provider initialized
šØ [InfiniteMemory] Creating model: claude-sonnet-4 (conv: conv_123, user: user_456)
šÆ [InfiniteMemory] Context budget: 100,000 tokens (model: claude-sonnet-4)
š [InfiniteMemory] Recent 5 messages: 1,234 tokens
š [InfiniteMemory] Found 15 relevant messages
ā
[InfiniteMemory] Context built: 12 retrieved (45,678 tokens) + 5 recent = 46,912 tokens
š [InfiniteMemory] Stored message msg_xyz (assistant)
Contributions are welcome! Please open an issue or PR on GitHub.
Apache 2.0 Ā© Dark Research
- Vercel AI SDK - AI framework
- Anthropic Claude - Language model
- OpenMemory - Semantic memory engine
---
Made with ā¤ļø by Dark Research