Agent Testing Library

A powerful, flexible testing framework for AI agents and multi-agent systems with built-in AI Judge verification. Supports both traditional test frameworks (Bun/Vitest) and standalone JSON-based testing.

![npm version](https://www.npmjs.com/package/agent-testing-library)
![License: MIT](https://opensource.org/licenses/MIT)

Philosophy

Test behavior, not implementation - Focus on what agents do, not how they do it.

Features

- 🔌 Adapter System - Works with any agent framework (Apteva, LangChain, CrewAI, etc.)
- 🤖 AI Judge - AI-based verification for non-deterministic outputs
- 📋 JSON Test Definitions - Declarative, portable test suites
- 🚀 Standalone Mode - No test framework required
- 👥 Multi-Agent Native - First-class support for agent interactions
- ⚡ Node & Bun Compatible - Works everywhere
- 🎯 Developer-Friendly - Familiar, intuitive API
- 🧪 Rich Assertions - Custom matchers for agent testing

Installation

``bash npm install agent-testing-library

`or`


yarn add agent-testing-library
or

bun add agent-testing-library


Quick Start
$3
Perfect for CI/CD, serverless functions, or simple test automation.
1. Create a JSON test file:

`json { "name": "Agent Test Suite", "config": { "adapter": { "type": "apteva", "baseUrl": "http://localhost:3000/agents", "apiKey": "agt_...", "projectId": "1" }, "aiJudge": { "enabled": true } }, "agents": [ { "id": "my-agent", "type": "create", "config": { "provider": "anthropic", "model": "claude-haiku-4-5", "systemPrompt": "You are a helpful assistant." } } ], "tests": [ { "name": "should answer questions", "agent": "my-agent", "message": "What is 2+2?", "assertions": [ {"type": "status", "value": "success"}, {"type": "contains", "value": "4"}, { "type": "ai-judge", "criteria": ["Response correctly answers the math question"], "minScore": 0.8 } ] } ] }`

2. Run it (no test framework needed):

`javascript import { runJsonTest } from 'agent-testing-library';

const results = await runJsonTest('./my-test.json'); console.log(results); // { // suite: "Agent Test Suite", // passed: true, // results: [...] // }`

Or use the CLI runner:`bash node -e "import('agent-testing-library').then(({runJsonTest})=>runJsonTest('./test.json'))"`

`$3`

Full programmatic control with test frameworks.

`javascript import { describe, it, expect, beforeAll, afterAll } from 'bun:test'; // or 'vitest' import { createAgentTester, AIJudge } from 'agent-testing-library'; import { AptevaAdapter } from 'agent-testing-library/adapters'; import { initMatchers } from 'agent-testing-library/matchers';

const aiJudge = new AIJudge(); // Auto-detects API key from env const tester = createAgentTester({ adapter: new AptevaAdapter({ baseUrl: process.env.APTEVA_URL, apiKey: process.env.APTEVA_API_KEY, projectId: process.env.APTEVA_PROJECT_ID, }), aiJudge, });

initMatchers(aiJudge);

describe('My Agent', () => { let agent;

beforeAll(async () => { agent = await tester.createAgent({ name: 'test-agent', config: { provider: 'anthropic', model: 'claude-haiku-4-5', systemPrompt: 'You are a helpful assistant.', }, }); });

afterAll(async () => { await tester.cleanup(); });

it('should respond to greetings', async () => { const response = await agent.chat('Hello!');

expect(response).toHaveStatus('success'); expect(response.message).toBeDefined(); });

it('should provide accurate answers', async () => { const response = await agent.chat('What is 2+2?');

// AI Judge verification await expect(response).toMatchAIExpectation({ criteria: ['Response correctly states that 2+2=4'], }); }); });`

`JSON Test Format`

`$3`

`json { "name": "Test Suite Name", "description": "Optional description", "config": { "adapter": { "type": "apteva", "baseUrl": "${APTEVA_URL}", "apiKey": "${APTEVA_API_KEY}", "projectId": "${APTEVA_PROJECT_ID}" }, "aiJudge": { "enabled": true, "provider": "openai", "model": "gpt-4" } }, "agents": [...], "tests": [...] }`

`$3`

Create New Agent:`json { "id": "my-agent", "type": "create", "config": { "name": "agent-name", "provider": "anthropic", "model": "claude-haiku-4-5", "systemPrompt": "You are..." } }`

Use Existing Agent:`json { "id": "existing-agent", "type": "use", "agentId": "agent-abc123" }`

`$3`

Status Check:`json {"type": "status", "value": "success"}`

Content Match:`json {"type": "contains", "value": "expected text"}`

AI Judge Evaluation:`json { "type": "ai-judge", "criteria": [ "Response is accurate", "Response is helpful" ], "minScore": 0.8 }`

`$3`

Use ${VAR_NAME} to inject environment variables:

`json { "config": { "adapter": { "apiKey": "${APTEVA_API_KEY}" } } }`

`Programmatic API`

`$3`

`javascript import { createAgentTester, AIJudge } from 'agent-testing-library'; import { AptevaAdapter } from 'agent-testing-library/adapters';

async function testMyAgent() { const aiJudge = new AIJudge(); // Auto-detects OPENAI_API_KEY or ANTHROPIC_API_KEY

const tester = createAgentTester({ adapter: new AptevaAdapter({ baseUrl: process.env.APTEVA_URL, apiKey: process.env.APTEVA_API_KEY, projectId: process.env.APTEVA_PROJECT_ID, }), aiJudge, });

// Create agent const agent = await tester.createAgent({ name: 'my-agent', config: { provider: 'anthropic', model: 'claude-haiku-4-5', systemPrompt: 'You are helpful.', }, });

// Test const response = await agent.chat('What is 2+2?');

// AI Judge evaluation const evaluation = await aiJudge.evaluate({ response: response.message, criteria: ['Response correctly answers that 2+2=4'], trace: response.trace, });

// Cleanup await tester.cleanup();

return { passed: evaluation.passed, score: evaluation.score, feedback: evaluation.feedback, }; }

// Run it testMyAgent().then(console.log);`

`$3`

`javascript // Attach to existing agent (won't be deleted) const agent = await tester.useAgent('agent-abc123');

// Create persistent agent const agent = await tester.createAgent({ name: 'persistent-agent', skipCleanup: true, // Won't be deleted during cleanup config: {...}, });`

`$3`

`javascript // JSON format { "agents": [ {"id": "researcher", "type": "create", "config": {...}}, {"id": "writer", "type": "create", "config": {...}} ], "tests": [ { "name": "agent coordination", "agent": "researcher", "message": "Research AI safety", "store": "research" }, { "name": "write summary", "agent": "writer", "message": "Summarize: ${research.message}", "assertions": [...] } ] }`

`javascript // Code format const researcher = await tester.createAgent({...}); const writer = await tester.createAgent({...});

const research = await researcher.chat('Research AI safety'); const article = await writer.chat(Write about: ${research.message});`

`AI Judge`

AI Judge provides intelligent verification of non-deterministic agent outputs.

`$3`

`javascript // Automatically detects OPENAI_API_KEY or ANTHROPIC_API_KEY from environment const aiJudge = new AIJudge();`

`$3`

`javascript const aiJudge = new AIJudge({ provider: 'openai', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o', });`

`$3`

`javascript const evaluation = await aiJudge.evaluate({ response: 'Agent response...', criteria: [ 'Response is accurate', 'Response is helpful', 'Response is concise' ], });

console.log(evaluation); // { // passed: true, // score: 0.95, // feedback: "The response meets all criteria..." // }`

`Custom Matchers`

`javascript import { initMatchers } from 'agent-testing-library/matchers';

initMatchers(aiJudge);

// Status expect(response).toHaveStatus('success');

// Content expect(response).toContainMessage('hello'); expect(response).toMatchPattern(/\d+/);

// AI-based await expect(response).toMatchAIExpectation({ criteria: 'Response is helpful and accurate', });

await expect(response).toBeFactuallyAccurate({ groundTruth: 'Expected facts...', });`

`Environment Variables`

Create a .env file:

`bash

`Apteva (or your adapter)`


APTEVA_URL=http://localhost:3000/agents
APTEVA_API_KEY=agt_...
APTEVA_PROJECT_ID=1
AI Judge (one or both)

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The library automatically loads .env files when using the JSON test runner.

`Running Tests`

`$3`

`bash

`Create a runner script`


echo 'import("agent-testing-library").then(({runJsonTest})=>runJsonTest("./test.json"))' > run.mjs
node run.mjs
Or use in your code

import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./test.json');

$3

`bash

`With Bun`


bun test
With Vitest/Node

npm test
Specific file

bun test ./tests/my-test.js


Examples
The package includes comprehensive examples:

- examples/basic-apteva.js- Basic agent testing -examples/basic-apteva-tools.js- Agent with MCP tools -examples/basic-apteva-multi.js- Multi-agent coordination -examples/existing-agent-test.js- Using existing agents -examples/run-json-test.js- JSON test runner -tests/json/simple-apteva-test.json - JSON test example

`API Reference`

`$3`

Run a JSON test suite.

Returns: Promise

agent-testing-library

Agent Testing Library

Philosophy

Features

Installation

or

or

Quick Start

$3

$3

JSON Test Format

$3

$3

$3

$3

Programmatic API

$3

$3

$3

AI Judge

$3

$3

$3

Custom Matchers

Environment Variables

Apteva (or your adapter)

AI Judge (one or both)

Running Tests

$3

Create a runner script

Or use in your code

$3

With Bun

With Vitest/Node

Specific file

Examples

API Reference

$3

$3

$3

Adapters

$3

$3

License

Contributing

agent-testing-library

Agent Testing Library

Philosophy

Features

Installation

or

or

Quick Start

$3

$3

JSON Test Format

$3

$3

$3

$3

Programmatic API

$3

$3

$3

AI Judge

$3

$3

$3

Custom Matchers

Environment Variables

Apteva (or your adapter)

AI Judge (one or both)

Running Tests

$3

Create a runner script

Or use in your code

$3

With Bun

With Vitest/Node

Specific file

`or`

`$3`

`JSON Test Format`

`$3`

`$3`

`$3`

`$3`

`Programmatic API`

`$3`

`$3`

`$3`

`AI Judge`

`$3`

`$3`

`$3`

`Custom Matchers`

`Environment Variables`

`Apteva (or your adapter)`

`Running Tests`

`$3`

`Create a runner script`

`With Bun`

`API Reference`

`$3`

`$3`

`Adapters`

`$3`

`$3`

`or`

`$3`

`JSON Test Format`

`$3`

`$3`

`$3`

`$3`

`Programmatic API`

`$3`

`$3`

`$3`

`AI Judge`

`$3`

`$3`

`$3`

`Custom Matchers`

`Environment Variables`

`Apteva (or your adapter)`

`Running Tests`

`$3`

`Create a runner script`

`With Bun`

`API Reference`

`$3`

`$3`

`Adapters`

`$3`

`$3`