LLM evaluation framework with batch processing and data sources
npm install orchestrated> LLM evaluation framework with batch processing, data sources, and multi-backend export
Orchestrated is a comprehensive evaluation framework for LLM applications. It provides a simple API for running evaluations with support for custom scorers, data sources, batch processing, and multiple export backends.
- 🎯 Simple API - Evaluate LLM outputs with a single function call
- 📊 Built-in Scorers - Effectiveness, GuardrailAdherence, Execution, Factuality, and more
- 🔧 Custom Scorers - Easy-to-define custom evaluation logic
- 📡 Data Sources - Built-in support for various data sources
- 🔄 Batch Processing - Efficient batch evaluation with resume support
- 📈 Progress Reporting - Beautiful terminal UI with progress bars
``bash`
npm install orchestratedor
pnpm add orchestratedor
yarn add orchestrated
`typescript
import { Levenshtein } from "autoevals";
import { Eval } from "orchestrated";
Eval("Simple Eval", {
data: [
{
input: "What is a good name for a child?",
expected: "John",
output: "Nurse",
},
{
input: "What is a good name for a child?",
expected: "John",
output: "John",
},
],
scores: [Levenshtein],
});
`
`typescript
import { Factuality } from "autoevals";
import { generateAI } from "./my-ai-service";
Eval("LLM Evaluation", {
data: [
{ input: "Explain quantum computing" },
{ input: "What is machine learning?" },
],
task: async (input) => {
const output = await generateAI(input);
return output;
},
scores: [Factuality],
});
`
`typescript
import { Eval, interactions } from "orchestrated";
Eval("Production Eval", {
data: interactions(),
scores: ["Effectiveness"],
});
`
`typescript
import { projects } from "orchestrated";
import { z } from "zod";
const project = projects.create();
const ContentSafety = project.scorers.create({
name: "ContentSafety",
slug: "content-safety",
description: "Evaluates content for safety violations",
parameters: z.object({
input: z.string(),
output: z.string(),
}),
handler: async ({ input, output }) => {
// Your custom scoring logic
const isSafe = !output.includes("unsafe-content");
return {
name: "ContentSafety",
score: isSafe ? 1 : 0,
metadata: { flagged: !isSafe },
};
},
});
// Use in evaluation
Eval("Safety Check", {
data: [...],
scores: [ContentSafety],
});
``