@plaited/agent-eval-harness

![npm version](https://www.npmjs.com/package/@plaited/agent-eval-harness)
![CI](https://github.com/plaited/agent-eval-harness/actions/workflows/ci.yml)
![License: ISC](https://opensource.org/licenses/ISC)

CLI tool for capturing agent trajectories from headless CLI agents. Execute prompts, capture full trajectories (tools, thoughts, plans), and output structured JSONL for downstream scoring. Available as both a CLI tool and as installable skills for AI coding agents.

CLI Tool

Use these tools directly via the CLI without installation:

``bash

`Using built-in headless adapter (recommended - no extra install needed)`


export ANTHROPIC_API_KEY=sk-...
bunx @plaited/agent-eval-harness capture prompts.jsonl \
  --schema ./schemas/claude-headless.json \
  -o results.jsonl


Prerequisite: Set your API key. The harness works with any CLI agent that supports JSON output - just provide a schema describing how to interact with it:

`bash export ANTHROPIC_API_KEY=sk-... # For Claude export GEMINI_API_KEY=... # For Gemini`

Pre-built schemas are available in .agents/skills/headless-adapters/schemas/ for Claude and Gemini.

`$3`

| Command | Description | |---------|-------------| |capture --schema | Trajectory capture (full JSONL) | |trials --schema | Multi-run with pass@k metrics | |summarize | Derive compact views from results | |calibrate | Sample failures for review | |validate-refs | Check reference solutions | |balance | Analyze test set coverage | |schemas [name]| Export JSON schemas | |headless --schema | Schema-driven adapter for any CLI agent |

`$3`

| Command | Description | |---------|-------------| |run --schema | Execute prompts, output raw results | |extract --schema | Parse raw output into trajectories | |grade --grader | Apply grader to extracted results | |format --style