Orchestrated

> LLM evaluation framework with batch processing, data sources, and multi-backend export

Orchestrated is a comprehensive evaluation framework for LLM applications. It provides a simple API for running evaluations with support for custom scorers, data sources, batch processing, and multiple export backends.

Features

- 🎯 Simple API - Evaluate LLM outputs with a single function call
- 📊 Built-in Scorers - Effectiveness, GuardrailAdherence, Execution, Factuality, and more
- 🔧 Custom Scorers - Easy-to-define custom evaluation logic
- 📡 Data Sources - Built-in support for various data sources
- 🔄 Batch Processing - Efficient batch evaluation with resume support
- 📈 Progress Reporting - Beautiful terminal UI with progress bars

Installation

``bash npm install orchestrated

`or`


pnpm add orchestrated
or

yarn add orchestrated


Quick Start

`typescript import { Levenshtein } from "autoevals"; import { Eval } from "orchestrated";

Eval("Simple Eval", { data: [ { input: "What is a good name for a child?", expected: "John", output: "Nurse", }, { input: "What is a good name for a child?", expected: "John", output: "John", }, ], scores: [Levenshtein], });`

`Usage Examples`

`$3`

`typescript import { Factuality } from "autoevals"; import { generateAI } from "./my-ai-service";

Eval("LLM Evaluation", { data: [ { input: "Explain quantum computing" }, { input: "What is machine learning?" }, ], task: async (input) => { const output = await generateAI(input); return output; }, scores: [Factuality], });`

`$3`

`typescript import { Eval, interactions } from "orchestrated";

Eval("Production Eval", { data: interactions(), scores: ["Effectiveness"], });`

`$3`

`typescript import { projects } from "orchestrated"; import { z } from "zod";

const project = projects.create();

const ContentSafety = project.scorers.create({ name: "ContentSafety", slug: "content-safety", description: "Evaluates content for safety violations", parameters: z.object({ input: z.string(), output: z.string(), }), handler: async ({ input, output }) => { // Your custom scoring logic const isSafe = !output.includes("unsafe-content"); return { name: "ContentSafety", score: isSafe ? 1 : 0, metadata: { flagged: !isSafe }, }; }, });

// Use in evaluation Eval("Safety Check", { data: [...], scores: [ContentSafety], });``

Orchestrated

> LLM evaluation framework with batch processing, data sources, and multi-backend export

Features

Installation

``bash npm install orchestrated

`or`


pnpm add orchestrated
or

yarn add orchestrated


Quick Start

`typescript import { Levenshtein } from "autoevals"; import { Eval } from "orchestrated";

`Usage Examples`

`$3`

`typescript import { Factuality } from "autoevals"; import { generateAI } from "./my-ai-service";

`$3`

`typescript import { Eval, interactions } from "orchestrated";

Eval("Production Eval", { data: interactions(), scores: ["Effectiveness"], });`

`$3`

`typescript import { projects } from "orchestrated"; import { z } from "zod";

const project = projects.create();

// Use in evaluation Eval("Safety Check", { data: [...], scores: [ContentSafety], });``