verifiers-ts

TypeScript implementation of the verifiers framework for building RL environments and evaluations with AI SDK integration.

Overview

verifiers-ts provides the same core functionality as the Python verifiers library, enabling you to:

- Define custom interaction protocols between models and environments
- Build agents, multi-turn conversations, tool-augmented reasoning, and interactive games
- Create reusable evaluation environments with multi-criteria reward functions
- Integrate with AI SDK for model inference and native tool calling

Installation

``bash npm install verifiers-ts`

Or if developing locally:

`bash cd verifiers-ts npm install npm run build`

`Quick Start`

`$3`

`bash pnpm dlx verifiers-ts vf-init weather-bot --minimal-rl cd weather-bot pnpm install pnpm build pnpm vf-eval -n 1 -r 1`

This template matches the screenshot example: a tool-enabled agent, tiny dataset, and a reward built with structuredOutputReward. Replace the prompt, tweak the agent defaults, and you’re ready to evaluate. Remember to export OPENAI_API_KEY (or pass --api-key to vf-eval).

`$3`

`bash pnpm dlx verifiers-ts vf-init my-environment cd my-environment pnpm install pnpm build pnpm vf-eval -n 1 -r 1`

Customize the generated src/index.ts, dataset, and reward functions to match your task.

> vf-eval automatically compiles your TypeScript, provisions a local .vf-eval/ virtualenv, and exposes the environment to Python tooling—no manual uv syncrequired. > ProvideOPENAI_API_KEY (or another provider key) so the default agent can make model calls.

`$3`

`typescript import { generateText, tool } from "ai"; import { z } from "zod"; import { openai } from "@ai-sdk/openai"; import { createRLEnvironment } from "verifiers-ts";

const getCurrentWeather = tool({ description: "Get the current weather for a specific location.", parameters: z.object({ location: z .string() .describe("City and state, for example: Seattle, WA"), unit: z .enum(["celsius", "fahrenheit"]) .describe("Temperature unit to return.") .optional(), }), execute: async ({ location, unit }) => { const preferredUnit = unit ?? "celsius"; const temperature = preferredUnit === "celsius" ? 18 : 64; returnIt is ${temperature}°${preferredUnit === "celsius" ? "C" : "F"} and sunny in ${location}.; }, });

const weatherAgent = { generateText: (messages: any, options: Record = {}) => { const { tools = {}, ...rest } = options as { tools?: Record>; };

return generateText({ model: openai("gpt-4o-mini") as any, system: "You are WeatherBot. When a user asks about the weather, call the getCurrentWeather tool and report the results clearly.", temperature: 0, tools: { getCurrentWeather, ...tools }, messages, ...rest, }); }, tools: { getCurrentWeather }, };

`$3`

`typescript import { SingleTurnEnv, Rubric, Parser } from "verifiers-ts";

function correctAnswer(params: { completion: any; answer: string; }): number { const text = extractText(params.completion); return text.trim() === params.answer.trim() ? 1.0 : 0.0; }

const rubric = new Rubric({ funcs: [correctAnswer], weights: [1.0], });

const env = new SingleTurnEnv({ dataset: myDataset, systemPrompt: "Solve step by step", rubric, });

const results = await env.evaluate( "gpt-4", {}, 10, // numExamples 1, // rolloutsPerExample true, // scoreRollouts 32, // maxConcurrent undefined, // maxConcurrentGeneration undefined, // maxConcurrentScoring process.env.OPENAI_API_KEY );`

`$3`

`typescript import { ToolEnv, defineTool } from "verifiers-ts"; import { z } from "zod";

const calculator = defineTool( "calculate", "Perform arithmetic", z.object({ expression: z.string(), }), async (args) => { return eval(args.expression); // Use proper parser in production } );

const env = new ToolEnv({ tools: [calculator], maxTurns: 10, });

// AI SDK automatically handles tool calling loop const results = await env.evaluate("gpt-4", {}, 10);`

`Architecture`

The library mirrors the Python verifiers structure:

- Environments: Base Environment class with MultiTurnEnv, SingleTurnEnv, ToolEnv, StatefulToolEnv, and SandboxEnvvariants - Rubrics: Weighted reward functions for evaluation - Parsers: Extract structured information (Parser, ThinkParser, XMLParser) - Tools: Native AI SDK tool integration usingtool()function from 'ai' package - AI SDK Integration: UsesgenerateText for model calls and automatic tool calling

`Key Features`

`$3`

- Native Tool Calling: Tools use AI SDK's tool()function with Zod schemas - Automatic Loop Handling: AI SDK manages tool execution loops withstopWhenconditions - Type-Safe Tools: Zod schemas provide runtime validation and TypeScript types - Structured Outputs: Support forgenerateObject when needed

`$3`

- Results Format: Saves results in JSONL format compatible with Python vf-tui- Native TypeScript Evaluation: TypeScript projects use nativevf-evalCLI (no Python bridge needed) - Native Sandbox Client: Direct HTTP API integration with Prime Intellect sandboxes (no Python dependencies) - State Management: Same state structure as Python verifiers

`Environment Types`

`$3`


For Q&A tasks requiring a single model response.
$3

Base class for custom interaction protocols. Override

is_completed and env_response

.
$3

Uses AI SDK's native tool calling. Tools are defined with

defineTool()

 and automatically handled by AI SDK.
$3

Extends

ToolEnv

 for tools requiring dynamic state (e.g., sandbox IDs).
$3

Abstract base for Prime Intellect sandbox integration.
Evaluation

TypeScript environments are evaluated natively using the TypeScript vf-eval CLI:

`bash npx vf-eval hangman -n 5 -r 1`

The CLI automatically: - Detects TypeScript projects (those withpackage.json containing verifiers.envId but no pyproject.toml) - Uses native TypeScript evaluation implementation - Saves results in compatible JSONL format forvf-tui

For Python projects, vf-eval delegates to the Python verifiers CLI.

`Sandbox Support`

Sandbox environments (using SandboxEnv) use native TypeScript HTTP client to interact with Prime Intellect sandboxes. No Python dependencies required.

Configuration: - SetPRIME_INTELLECT_API_KEY or PRIME_API_KEYenvironment variable - Optional: SetPRIME_INTELLECT_API_URL (default: https://api.primeintellect.ai) - Optional: SetPRIME_INTELLECT_TEAM_ID for team-scoped sandboxes

`Examples`

See environments/directory for example implementations: -example-single-turn: Basic Q&A environment -example-tool-use: Tool calling with AI SDK

`Development`

This workspace uses Turborepo for task orchestration and caching. Use turbo run commands to build all packages with automatic dependency resolution and caching.

`bash

`Install dependencies`


pnpm install
Build all packages (core + environments)

pnpm turbo run build
Build a specific environment

pnpm turbo run build --filter hangman
Run tests

pnpm turbo run test
Lint all packages

pnpm turbo run lint
Format code

pnpm turbo run format
Watch mode (runs all dev tasks in parallel)

pnpm turbo run dev --parallel
Watch a specific environment

pnpm turbo run dev --parallel --filter hangman

$3

- Task Dependencies: Builds automatically respect workspace dependencies (dependsOn: ["^build"]) - Local Caching: Build outputs are cached locally for faster rebuilds - Parallel Execution: Dev tasks run in parallel across packages - Filtering: Use--filter to target specific packages

For remote caching (CI/CD), set TURBO_TEAM and TURBO_TOKEN` environment variables.

Status

✅ Core Complete - All base classes and AI SDK integration implemented
🔄 In Progress - Python bridge refinement
📝 Pending - Comprehensive tests and examples

License

MIT

verifiers-ts

TypeScript implementation of the verifiers framework for building RL environments and evaluations with AI SDK integration.

Overview

verifiers-ts provides the same core functionality as the Python verifiers library, enabling you to:

Installation

``bash npm install verifiers-ts`

Or if developing locally:

`bash cd verifiers-ts npm install npm run build`

`Quick Start`

`$3`

`bash pnpm dlx verifiers-ts vf-init weather-bot --minimal-rl cd weather-bot pnpm install pnpm build pnpm vf-eval -n 1 -r 1`

`$3`

`bash pnpm dlx verifiers-ts vf-init my-environment cd my-environment pnpm install pnpm build pnpm vf-eval -n 1 -r 1`

Customize the generated src/index.ts, dataset, and reward functions to match your task.

`$3`

`typescript import { generateText, tool } from "ai"; import { z } from "zod"; import { openai } from "@ai-sdk/openai"; import { createRLEnvironment } from "verifiers-ts";

const weatherAgent = { generateText: (messages: any, options: Record = {}) => { const { tools = {}, ...rest } = options as { tools?: Record>; };

`$3`

`typescript import { SingleTurnEnv, Rubric, Parser } from "verifiers-ts";

function correctAnswer(params: { completion: any; answer: string; }): number { const text = extractText(params.completion); return text.trim() === params.answer.trim() ? 1.0 : 0.0; }

const rubric = new Rubric({ funcs: [correctAnswer], weights: [1.0], });

const env = new SingleTurnEnv({ dataset: myDataset, systemPrompt: "Solve step by step", rubric, });

`$3`

`typescript import { ToolEnv, defineTool } from "verifiers-ts"; import { z } from "zod";

const calculator = defineTool( "calculate", "Perform arithmetic", z.object({ expression: z.string(), }), async (args) => { return eval(args.expression); // Use proper parser in production } );

const env = new ToolEnv({ tools: [calculator], maxTurns: 10, });

// AI SDK automatically handles tool calling loop const results = await env.evaluate("gpt-4", {}, 10);`

`Architecture`

The library mirrors the Python verifiers structure:

`Key Features`

`$3`

`Environment Types`

`$3`


For Q&A tasks requiring a single model response.
$3

Base class for custom interaction protocols. Override

is_completed and env_response

.
$3

Uses AI SDK's native tool calling. Tools are defined with

defineTool()

 and automatically handled by AI SDK.
$3

Extends

ToolEnv

 for tools requiring dynamic state (e.g., sandbox IDs).
$3

Abstract base for Prime Intellect sandbox integration.
Evaluation

TypeScript environments are evaluated natively using the TypeScript vf-eval CLI:

`bash npx vf-eval hangman -n 5 -r 1`

For Python projects, vf-eval delegates to the Python verifiers CLI.

`Sandbox Support`

Sandbox environments (using SandboxEnv) use native TypeScript HTTP client to interact with Prime Intellect sandboxes. No Python dependencies required.

`Examples`

See environments/directory for example implementations: -example-single-turn: Basic Q&A environment -example-tool-use: Tool calling with AI SDK

`Development`

This workspace uses Turborepo for task orchestration and caching. Use turbo run commands to build all packages with automatic dependency resolution and caching.

`bash

`Install dependencies`


pnpm install
Build all packages (core + environments)

pnpm turbo run build
Build a specific environment

pnpm turbo run build --filter hangman
Run tests

pnpm turbo run test
Lint all packages

pnpm turbo run lint
Format code

pnpm turbo run format
Watch mode (runs all dev tasks in parallel)

pnpm turbo run dev --parallel
Watch a specific environment

pnpm turbo run dev --parallel --filter hangman

$3

For remote caching (CI/CD), set TURBO_TEAM and TURBO_TOKEN` environment variables.

Status

✅ Core Complete - All base classes and AI SDK integration implemented
🔄 In Progress - Python bridge refinement
📝 Pending - Comprehensive tests and examples

License

MIT