TypeScript implementation of the verifiers framework for RL environments
npm install verifiers-tsTypeScript implementation of the verifiers framework for building RL environments and evaluations with AI SDK integration.
verifiers-ts provides the same core functionality as the Python verifiers library, enabling you to:
- Define custom interaction protocols between models and environments
- Build agents, multi-turn conversations, tool-augmented reasoning, and interactive games
- Create reusable evaluation environments with multi-criteria reward functions
- Integrate with AI SDK for model inference and native tool calling
``bash`
npm install verifiers-ts
Or if developing locally:
`bash`
cd verifiers-ts
npm install
npm run build
`bash`
pnpm dlx verifiers-ts vf-init weather-bot --minimal-rl
cd weather-bot
pnpm install
pnpm build
pnpm vf-eval -n 1 -r 1
This template matches the screenshot example: a tool-enabled agent, tiny dataset, and a reward built with structuredOutputReward. Replace the prompt, tweak the agent defaults, and you’re ready to evaluate. Remember to export OPENAI_API_KEY (or pass --api-key to vf-eval).
`bash`
pnpm dlx verifiers-ts vf-init my-environment
cd my-environment
pnpm install
pnpm build
pnpm vf-eval -n 1 -r 1
Customize the generated src/index.ts, dataset, and reward functions to match your task.
> vf-eval automatically compiles your TypeScript, provisions a local .vf-eval/ virtualenv, and exposes the environment to Python tooling—no manual uv sync required.OPENAI_API_KEY
> Provide (or another provider key) so the default agent can make model calls.
`typescript
import { generateText, tool } from "ai";
import { z } from "zod";
import { openai } from "@ai-sdk/openai";
import { createRLEnvironment } from "verifiers-ts";
const getCurrentWeather = tool({
description: "Get the current weather for a specific location.",
parameters: z.object({
location: z
.string()
.describe("City and state, for example: Seattle, WA"),
unit: z
.enum(["celsius", "fahrenheit"])
.describe("Temperature unit to return.")
.optional(),
}),
execute: async ({ location, unit }) => {
const preferredUnit = unit ?? "celsius";
const temperature = preferredUnit === "celsius" ? 18 : 64;
return It is ${temperature}°${preferredUnit === "celsius" ? "C" : "F"} and sunny in ${location}.;
},
});
const weatherAgent = {
generateText: (messages: any, options: Record
const { tools = {}, ...rest } = options as {
tools?: Record
};
return generateText({
model: openai("gpt-4o-mini") as any,
system:
"You are WeatherBot. When a user asks about the weather, call the getCurrentWeather tool and report the results clearly.",
temperature: 0,
tools: { getCurrentWeather, ...tools },
messages,
...rest,
});
},
tools: { getCurrentWeather },
};
const env = await createRLEnvironment({
agent: weatherAgent,
dataset: [
{
prompt: [
{
role: "user",
content: "What's the weather like in Seattle right now?",
},
],
answer: "seattle",
},
],
rewardFunction: (completion, answer) => {
const text = Array.isArray(completion)
? completion
.filter(
(msg) =>
typeof msg === "object" &&
msg !== null &&
"role" in msg &&
msg.role === "assistant"
)
.map((msg) => (msg as { content?: string }).content ?? "")
.join(" ")
: typeof completion === "string"
? completion
: "";
const normalized = text.toLowerCase();
return normalized.includes(answer) && normalized.includes("weather") ? 1 : 0;
},
});
`
`typescript
import { SingleTurnEnv, Rubric, Parser } from "verifiers-ts";
function correctAnswer(params: {
completion: any;
answer: string;
}): number {
const text = extractText(params.completion);
return text.trim() === params.answer.trim() ? 1.0 : 0.0;
}
const rubric = new Rubric({
funcs: [correctAnswer],
weights: [1.0],
});
const env = new SingleTurnEnv({
dataset: myDataset,
systemPrompt: "Solve step by step",
rubric,
});
const results = await env.evaluate(
"gpt-4",
{},
10, // numExamples
1, // rolloutsPerExample
true, // scoreRollouts
32, // maxConcurrent
undefined, // maxConcurrentGeneration
undefined, // maxConcurrentScoring
process.env.OPENAI_API_KEY
);
`
`typescript
import { ToolEnv, defineTool } from "verifiers-ts";
import { z } from "zod";
const calculator = defineTool(
"calculate",
"Perform arithmetic",
z.object({
expression: z.string(),
}),
async (args) => {
return eval(args.expression); // Use proper parser in production
}
);
const env = new ToolEnv({
tools: [calculator],
maxTurns: 10,
});
// AI SDK automatically handles tool calling loop
const results = await env.evaluate("gpt-4", {}, 10);
`
The library mirrors the Python verifiers structure:
- Environments: Base Environment class with MultiTurnEnv, SingleTurnEnv, ToolEnv, StatefulToolEnv, and SandboxEnv variantsParser
- Rubrics: Weighted reward functions for evaluation
- Parsers: Extract structured information (, ThinkParser, XMLParser)tool()
- Tools: Native AI SDK tool integration using function from 'ai' packagegenerateText
- AI SDK Integration: Uses for model calls and automatic tool calling
- Native Tool Calling: Tools use AI SDK's tool() function with Zod schemasstopWhen
- Automatic Loop Handling: AI SDK manages tool execution loops with conditionsgenerateObject
- Type-Safe Tools: Zod schemas provide runtime validation and TypeScript types
- Structured Outputs: Support for when needed
- Results Format: Saves results in JSONL format compatible with Python vf-tuivf-eval
- Native TypeScript Evaluation: TypeScript projects use native CLI (no Python bridge needed)
- Native Sandbox Client: Direct HTTP API integration with Prime Intellect sandboxes (no Python dependencies)
- State Management: Same state structure as Python verifiers
and env_response.$3
Uses AI SDK's native tool calling. Tools are defined with defineTool() and automatically handled by AI SDK.$3
Extends ToolEnv for tools requiring dynamic state (e.g., sandbox IDs).$3
Abstract base for Prime Intellect sandbox integration.Evaluation
TypeScript environments are evaluated natively using the TypeScript
vf-eval CLI:`bash
npx vf-eval hangman -n 5 -r 1
`The CLI automatically:
- Detects TypeScript projects (those with
package.json containing verifiers.envId but no pyproject.toml)
- Uses native TypeScript evaluation implementation
- Saves results in compatible JSONL format for vf-tuiFor Python projects,
vf-eval delegates to the Python verifiers CLI.Sandbox Support
Sandbox environments (using
SandboxEnv) use native TypeScript HTTP client to interact with Prime Intellect sandboxes. No Python dependencies required.Configuration:
- Set
PRIME_INTELLECT_API_KEY or PRIME_API_KEY environment variable
- Optional: Set PRIME_INTELLECT_API_URL (default: https://api.primeintellect.ai)
- Optional: Set PRIME_INTELLECT_TEAM_ID for team-scoped sandboxesExamples
See
environments/ directory for example implementations:
- example-single-turn: Basic Q&A environment
- example-tool-use: Tool calling with AI SDKDevelopment
This workspace uses Turborepo for task orchestration and caching. Use
turbo run commands to build all packages with automatic dependency resolution and caching.`bash
Install dependencies
pnpm installBuild all packages (core + environments)
pnpm turbo run buildBuild a specific environment
pnpm turbo run build --filter hangmanRun tests
pnpm turbo run testLint all packages
pnpm turbo run lintFormat code
pnpm turbo run formatWatch mode (runs all dev tasks in parallel)
pnpm turbo run dev --parallelWatch a specific environment
pnpm turbo run dev --parallel --filter hangman
`$3
- Task Dependencies: Builds automatically respect workspace dependencies (
dependsOn: ["^build"])
- Local Caching: Build outputs are cached locally for faster rebuilds
- Parallel Execution: Dev tasks run in parallel across packages
- Filtering: Use --filter to target specific packagesFor remote caching (CI/CD), set
TURBO_TEAM and TURBO_TOKEN` environment variables.✅ Core Complete - All base classes and AI SDK integration implemented
🔄 In Progress - Python bridge refinement
📝 Pending - Comprehensive tests and examples
MIT