Testing framework for AI agents and multi-agent systems with AI Judge verification (Node & Bun compatible)
npm install agent-testing-libraryA powerful, flexible testing framework for AI agents and multi-agent systems with built-in AI Judge verification. Supports both traditional test frameworks (Bun/Vitest) and standalone JSON-based testing.


Test behavior, not implementation - Focus on what agents do, not how they do it.
- ๐ Adapter System - Works with any agent framework (Apteva, LangChain, CrewAI, etc.)
- ๐ค AI Judge - AI-based verification for non-deterministic outputs
- ๐ JSON Test Definitions - Declarative, portable test suites
- ๐ Standalone Mode - No test framework required
- ๐ฅ Multi-Agent Native - First-class support for agent interactions
- โก Node & Bun Compatible - Works everywhere
- ๐ฏ Developer-Friendly - Familiar, intuitive API
- ๐งช Rich Assertions - Custom matchers for agent testing
``bash`
npm install agent-testing-libraryor
yarn add agent-testing-libraryor
bun add agent-testing-library
Perfect for CI/CD, serverless functions, or simple test automation.
1. Create a JSON test file:
`json`
{
"name": "Agent Test Suite",
"config": {
"adapter": {
"type": "apteva",
"baseUrl": "http://localhost:3000/agents",
"apiKey": "agt_...",
"projectId": "1"
},
"aiJudge": {
"enabled": true
}
},
"agents": [
{
"id": "my-agent",
"type": "create",
"config": {
"provider": "anthropic",
"model": "claude-haiku-4-5",
"systemPrompt": "You are a helpful assistant."
}
}
],
"tests": [
{
"name": "should answer questions",
"agent": "my-agent",
"message": "What is 2+2?",
"assertions": [
{"type": "status", "value": "success"},
{"type": "contains", "value": "4"},
{
"type": "ai-judge",
"criteria": ["Response correctly answers the math question"],
"minScore": 0.8
}
]
}
]
}
2. Run it (no test framework needed):
`javascript
import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./my-test.json');
console.log(results);
// {
// suite: "Agent Test Suite",
// passed: true,
// results: [...]
// }
`
Or use the CLI runner:
`bash`
node -e "import('agent-testing-library').then(({runJsonTest})=>runJsonTest('./test.json'))"
Full programmatic control with test frameworks.
`javascript
import { describe, it, expect, beforeAll, afterAll } from 'bun:test'; // or 'vitest'
import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';
import { initMatchers } from 'agent-testing-library/matchers';
const aiJudge = new AIJudge(); // Auto-detects API key from env
const tester = createAgentTester({
adapter: new AptevaAdapter({
baseUrl: process.env.APTEVA_URL,
apiKey: process.env.APTEVA_API_KEY,
projectId: process.env.APTEVA_PROJECT_ID,
}),
aiJudge,
});
initMatchers(aiJudge);
describe('My Agent', () => {
let agent;
beforeAll(async () => {
agent = await tester.createAgent({
name: 'test-agent',
config: {
provider: 'anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful assistant.',
},
});
});
afterAll(async () => {
await tester.cleanup();
});
it('should respond to greetings', async () => {
const response = await agent.chat('Hello!');
expect(response).toHaveStatus('success');
expect(response.message).toBeDefined();
});
it('should provide accurate answers', async () => {
const response = await agent.chat('What is 2+2?');
// AI Judge verification
await expect(response).toMatchAIExpectation({
criteria: ['Response correctly states that 2+2=4'],
});
});
});
`
`json`
{
"name": "Test Suite Name",
"description": "Optional description",
"config": {
"adapter": {
"type": "apteva",
"baseUrl": "${APTEVA_URL}",
"apiKey": "${APTEVA_API_KEY}",
"projectId": "${APTEVA_PROJECT_ID}"
},
"aiJudge": {
"enabled": true,
"provider": "openai",
"model": "gpt-4"
}
},
"agents": [...],
"tests": [...]
}
Create New Agent:
`json`
{
"id": "my-agent",
"type": "create",
"config": {
"name": "agent-name",
"provider": "anthropic",
"model": "claude-haiku-4-5",
"systemPrompt": "You are..."
}
}
Use Existing Agent:
`json`
{
"id": "existing-agent",
"type": "use",
"agentId": "agent-abc123"
}
Status Check:
`json`
{"type": "status", "value": "success"}
Content Match:
`json`
{"type": "contains", "value": "expected text"}
AI Judge Evaluation:
`json`
{
"type": "ai-judge",
"criteria": [
"Response is accurate",
"Response is helpful"
],
"minScore": 0.8
}
Use ${VAR_NAME} to inject environment variables:
`json`
{
"config": {
"adapter": {
"apiKey": "${APTEVA_API_KEY}"
}
}
}
`javascript
import { createAgentTester, AIJudge } from 'agent-testing-library';
import { AptevaAdapter } from 'agent-testing-library/adapters';
async function testMyAgent() {
const aiJudge = new AIJudge(); // Auto-detects OPENAI_API_KEY or ANTHROPIC_API_KEY
const tester = createAgentTester({
adapter: new AptevaAdapter({
baseUrl: process.env.APTEVA_URL,
apiKey: process.env.APTEVA_API_KEY,
projectId: process.env.APTEVA_PROJECT_ID,
}),
aiJudge,
});
// Create agent
const agent = await tester.createAgent({
name: 'my-agent',
config: {
provider: 'anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are helpful.',
},
});
// Test
const response = await agent.chat('What is 2+2?');
// AI Judge evaluation
const evaluation = await aiJudge.evaluate({
response: response.message,
criteria: ['Response correctly answers that 2+2=4'],
trace: response.trace,
});
// Cleanup
await tester.cleanup();
return {
passed: evaluation.passed,
score: evaluation.score,
feedback: evaluation.feedback,
};
}
// Run it
testMyAgent().then(console.log);
`
`javascript
// Attach to existing agent (won't be deleted)
const agent = await tester.useAgent('agent-abc123');
// Create persistent agent
const agent = await tester.createAgent({
name: 'persistent-agent',
skipCleanup: true, // Won't be deleted during cleanup
config: {...},
});
`
`javascript`
// JSON format
{
"agents": [
{"id": "researcher", "type": "create", "config": {...}},
{"id": "writer", "type": "create", "config": {...}}
],
"tests": [
{
"name": "agent coordination",
"agent": "researcher",
"message": "Research AI safety",
"store": "research"
},
{
"name": "write summary",
"agent": "writer",
"message": "Summarize: ${research.message}",
"assertions": [...]
}
]
}
`javascript
// Code format
const researcher = await tester.createAgent({...});
const writer = await tester.createAgent({...});
const research = await researcher.chat('Research AI safety');
const article = await writer.chat(Write about: ${research.message});`
AI Judge provides intelligent verification of non-deterministic agent outputs.
`javascript`
// Automatically detects OPENAI_API_KEY or ANTHROPIC_API_KEY from environment
const aiJudge = new AIJudge();
`javascript`
const aiJudge = new AIJudge({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o',
});
`javascript
const evaluation = await aiJudge.evaluate({
response: 'Agent response...',
criteria: [
'Response is accurate',
'Response is helpful',
'Response is concise'
],
});
console.log(evaluation);
// {
// passed: true,
// score: 0.95,
// feedback: "The response meets all criteria..."
// }
`
`javascript
import { initMatchers } from 'agent-testing-library/matchers';
initMatchers(aiJudge);
// Status
expect(response).toHaveStatus('success');
// Content
expect(response).toContainMessage('hello');
expect(response).toMatchPattern(/\d+/);
// AI-based
await expect(response).toMatchAIExpectation({
criteria: 'Response is helpful and accurate',
});
await expect(response).toBeFactuallyAccurate({
groundTruth: 'Expected facts...',
});
`
Create a .env file:
`bashApteva (or your adapter)
APTEVA_URL=http://localhost:3000/agents
APTEVA_API_KEY=agt_...
APTEVA_PROJECT_ID=1
The library automatically loads
.env files when using the JSON test runner.Running Tests
$3
`bash
Create a runner script
echo 'import("agent-testing-library").then(({runJsonTest})=>runJsonTest("./test.json"))' > run.mjs
node run.mjsOr use in your code
import { runJsonTest } from 'agent-testing-library';
const results = await runJsonTest('./test.json');
`$3
`bash
With Bun
bun testWith Vitest/Node
npm testSpecific file
bun test ./tests/my-test.js
`Examples
The package includes comprehensive examples:
-
examples/basic-apteva.js - Basic agent testing
- examples/basic-apteva-tools.js - Agent with MCP tools
- examples/basic-apteva-multi.js - Multi-agent coordination
- examples/existing-agent-test.js - Using existing agents
- examples/run-json-test.js - JSON test runner
- tests/json/simple-apteva-test.json - JSON test exampleAPI Reference
$3
Run a JSON test suite.
Returns:
Promise