Lightweight, fast, and tiny LLM Context & Memory layout renderer to enforce token budgets in long running agents.
npm install @fastpaca/cria
TypeScript prompt architecture for fast-moving teams and engineers.
The LLM space moves fast. New models drop often. Providers change APIs. Better vector stores emerge. New memory systems drop. Your prompts shouldn't break every time the stack evolves.
Cria is prompt architecture as code. Same prompt logic, swap the building blocks underneath when you need to upgrade.
``ts
import { cria } from "@fastpaca/cria";
import { createProvider } from "@fastpaca/cria/openai";
import OpenAI from "openai";
const client = new OpenAI();
const model = "gpt-5-nano";
const provider = createProvider(client, model);
const summarizer = cria.summarizer({
id: "history",
store: memory,
provider,
});
const vectors = cria.vectordb(store);
const summary = summarizer.plugin({ history: conversation });
const retrieval = vectors.plugin({ query, limit: 8 });
const messages = await cria
.prompt(provider)
.system("You are a research assistant.")
.use(summary)
.use(cria.history({ history: recentTurns }))
.use(retrieval)
.user(query)
.render({ budget: 128_000 });
const response = await client.chat.completions.create({ model, messages });
`
When you run LLM features in production, you need to:
1. Build prompts that last — Swap providers, models, memory, or retrieval without rewriting prompt logic. A/B test components as the stack evolves.
2. Test like code — Evaluate prompts with LLM-as-a-judge. Run tests in CI. Catch drift when you swap building blocks.
3. Inspect what runs — See exactly what gets sent to the model. Debug token budgets. See when your RAG input messes up the context. (Local DevTools-style inspector: planned)
Cria gives you composable prompt blocks, explicit token budgets, and building blocks you can easily customise and adapt so you move fast without breaking prompts.
| Capability | Status |
| --- | --- |
| Component swapping via adapters | ✅ |
| Memory + vector search adapters | ✅ |
| Token budgeting | ✅ |
| Fit & compaction controls | ✅ |
| Conversation summaries | ✅ |
| OpenTelemetry integration | ✅ |
| Prompt eval/test helpers | ✅ |
| Local prompt inspector (DevTools-style) | planned |
`bash`
npm install @fastpaca/cria
`ts
import { cria } from "@fastpaca/cria";
import { createProvider } from "@fastpaca/cria/openai";
import OpenAI from "openai";
const client = new OpenAI();
const model = "gpt-5-nano";
const provider = createProvider(client, model);
const messages = await cria
.prompt(provider)
.system("You are a helpful assistant.")
.user("What is the capital of France?")
.render({ budget: 128_000 });
const response = await client.chat.completions.create({ model, messages });
`
RAG with vector search
`ts
const vectors = cria.vectordb(qdrant);
const retrieval = vectors.plugin({ query, limit: 10 });
const messages = await cria
.prompt(provider)
.system("You are a research assistant.")
.use(retrieval)
.user(query)
.render({ budget: 128_000 });
`
Summarize long conversation history
`ts
const summarizer = cria.summarizer({
id: "conv",
store: redis,
priority: 2,
provider,
});
const summary = summarizer.plugin({ history: conversation });
const messages = await cria
.prompt(provider)
.system("You are a helpful assistant.")
.use(summary)
.last(conversation, { n: 20 })
.user(query)
.render({ budget: 128_000 });
`
Token budgeting and compaction
`ts
const summarizer = cria.summarizer({
id: "conv",
store: redis,
priority: 2,
provider,
});
const summary = summarizer.plugin({ history: conversation });
const vectors = cria.vectordb(qdrant);
const retrieval = vectors.plugin({ query, limit: 10 });
const messages = await cria
.prompt(provider)
.system(SYSTEM_PROMPT)
// Dropped first when budget is tight
.omit(examples, { priority: 3 })
// Summaries are run ad-hoc once we hit budget limits
.use(summary)
// Sacred, need to retain but limit to only 10 entries
.use(retrieval)
.user(query)
// 128k token budget, once we hit the budget strategies
// will run based on priority & usage (e.g. summaries will
// trigger).
.render({ budget: 128_000 });
`
Evaluate prompts like code
`ts
import { c, cria } from "@fastpaca/cria";
import { createProvider } from "@fastpaca/cria/ai-sdk";
import { createJudge } from "@fastpaca/cria/eval";
import { openai } from "@ai-sdk/openai";
const judge = createJudge({
target: createProvider(openai("gpt-4o")),
evaluator: createProvider(openai("gpt-4o-mini")),
});
const prompt = await cria
.prompt()
.system("You are a helpful customer support agent.")
.user("How do I update my payment method?")
.build();
await judge(prompt).toPass(cProvides clear, actionable steps);`
OpenAI (Chat Completions)
`ts
import OpenAI from "openai";
import { createProvider } from "@fastpaca/cria/openai";
import { cria } from "@fastpaca/cria";
const client = new OpenAI();
const model = "gpt-5-nano";
const provider = createProvider(client, model);
const messages = await cria
.prompt(provider)
.system("You are helpful.")
.user(userQuestion)
.render({ budget: 128_000 });
const response = await client.chat.completions.create({ model, messages });
`
OpenAI (Responses)
`ts
import OpenAI from "openai";
import { createResponsesProvider } from "@fastpaca/cria/openai";
import { cria } from "@fastpaca/cria";
const client = new OpenAI();
const model = "gpt-5-nano";
const provider = createResponsesProvider(client, model);
const input = await cria
.prompt(provider)
.system("You are helpful.")
.user(userQuestion)
.render({ budget: 128_000 });
const response = await client.responses.create({ model, input });
`
Anthropic
`ts
import Anthropic from "@anthropic-ai/sdk";
import { createProvider } from "@fastpaca/cria/anthropic";
import { cria } from "@fastpaca/cria";
const client = new Anthropic();
const model = "claude-sonnet-4";
const provider = createProvider(client, model);
const { system, messages } = await cria
.prompt(provider)
.system("You are helpful.")
.user(userQuestion)
.render({ budget: 128_000 });
const response = await client.messages.create({ model, system, messages });
`
Vercel AI SDK
`ts
import { createProvider } from "@fastpaca/cria/ai-sdk";
import { cria } from "@fastpaca/cria";
import { generateText } from "ai";
const provider = createProvider(model);
const messages = await cria
.prompt(provider)
.system("You are helpful.")
.user(userQuestion)
.render({ budget: 128_000 });
const { text } = await generateText({ model, messages });
`
Redis (conversation summaries)
`ts
import { cria, type StoredSummary } from "@fastpaca/cria";
import { RedisStore } from "@fastpaca/cria/memory/redis";
const store = new RedisStore
host: "localhost",
port: 6379,
});
const summarizer = cria.summarizer({
id: "conv-123",
store,
priority: 2,
provider,
});
const summary = summarizer.plugin({ history: conversation });
const messages = await cria
.prompt(provider)
.system("You are a helpful assistant.")
.use(summary)
.last(conversation, { n: 20 })
.user(query)
.render({ budget: 128_000 });
`
Postgres (conversation summaries)
`ts
import { cria, type StoredSummary } from "@fastpaca/cria";
import { PostgresStore } from "@fastpaca/cria/memory/postgres";
const store = new PostgresStore
connectionString: "postgres://user:pass@localhost/mydb",
});
const summarizer = cria.summarizer({
id: "conv-123",
store,
priority: 2,
provider,
});
const summary = summarizer.plugin({ history: conversation });
const messages = await cria
.prompt(provider)
.system("You are a helpful assistant.")
.use(summary)
.last(conversation, { n: 20 })
.user(query)
.render({ budget: 128_000 });
`
SQLite (conversation summaries)
`ts
import { cria, type StoredSummary } from "@fastpaca/cria";
import { SqliteStore } from "@fastpaca/cria/memory/sqlite";
const store = new SqliteStore
filename: "cria.sqlite",
});
const summarizer = cria.summarizer({
id: "conv-123",
store,
priority: 2,
provider,
});
const summary = summarizer.plugin({ history: conversation });
const messages = await cria
.prompt(provider)
.system("You are a helpful assistant.")
.use(summary)
.last(conversation, { n: 20 })
.user(query)
.render({ budget: 128_000 });
`
SQLite (vector search)
`ts
import { z } from "zod";
import { cria } from "@fastpaca/cria";
import { SqliteVectorStore } from "@fastpaca/cria/memory/sqlite-vector";
const store = new SqliteVectorStore
filename: "cria.sqlite",
dimensions: 1536,
embed: async (text) => await getEmbedding(text),
schema: z.string(),
});
const vectors = cria.vectordb(store);
const retrieval = vectors.plugin({ query, limit: 10 });
const messages = await cria
.prompt(provider)
.system("You are a research assistant.")
.use(retrieval)
.user(query)
.render({ budget: 128_000 });
`
Chroma (vector search)
`ts
import { ChromaClient } from "chromadb";
import { cria } from "@fastpaca/cria";
import { ChromaStore } from "@fastpaca/cria/memory/chroma";
const client = new ChromaClient({ path: "http://localhost:8000" });
const collection = await client.getOrCreateCollection({ name: "my-docs" });
const store = new ChromaStore({
collection,
embed: async (text) => await getEmbedding(text),
});
const vectors = cria.vectordb(store);
const retrieval = vectors.plugin({ query, limit: 10 });
const messages = await cria
.prompt(provider)
.system("You are a research assistant.")
.use(retrieval)
.user(query)
.render({ budget: 128_000 });
`
Qdrant (vector search)
`ts
import { QdrantClient } from "@qdrant/js-client-rest";
import { cria } from "@fastpaca/cria";
import { QdrantStore } from "@fastpaca/cria/memory/qdrant";
const client = new QdrantClient({ url: "http://localhost:6333" });
const store = new QdrantStore({
client,
collectionName: "my-docs",
embed: async (text) => await getEmbedding(text),
});
const vectors = cria.vectordb(store);
const retrieval = vectors.plugin({ query, limit: 10 });
const messages = await cria
.prompt(provider)
.system("You are a research assistant.")
.use(retrieval)
.user(query)
.render({ budget: 128_000 });
``
- Quickstart
- RAG / vector search
- Use history plugin
- Summarize long history
- Fit & compaction
- Prompt evaluation
- Full documentation
What does Cria output?
Prompt structures/messages (via a provider adapter). You pass the rendered output into your existing LLM SDK call.
What works out of the box?
Provider adapters for OpenAI (Chat Completions + Responses), Anthropic, and Vercel AI SDK; store adapters for Redis, SQLite, Postgres, Chroma, and Qdrant.
How do I validate component swaps?
Swap via adapters, diff the rendered prompt output, and run prompt eval/tests to catch drift.
What's the API stability?
We use Cria in production, but the API may change before 2.0. Pin versions and follow the changelog.
Issues and PRs welcome. Keep changes small and focused.
MIT