CLI tool for capturing agent trajectories from headless CLI agents
npm install @plaited/agent-eval-harness


CLI tool for capturing agent trajectories from headless CLI agents. Execute prompts, capture full trajectories (tools, thoughts, plans), and output structured JSONL for downstream scoring. Available as both a CLI tool and as installable skills for AI coding agents.
Use these tools directly via the CLI without installation:
``bash`Using built-in headless adapter (recommended - no extra install needed)
export ANTHROPIC_API_KEY=sk-...
bunx @plaited/agent-eval-harness capture prompts.jsonl \
--schema ./schemas/claude-headless.json \
-o results.jsonl
Prerequisite: Set your API key. The harness works with any CLI agent that supports JSON output - just provide a schema describing how to interact with it:
`bash`
export ANTHROPIC_API_KEY=sk-... # For Claude
export GEMINI_API_KEY=... # For Gemini
Pre-built schemas are available in .agents/skills/headless-adapters/schemas/ for Claude and Gemini.
| Command | Description |
|---------|-------------|
| capture | Trajectory capture (full JSONL) |trials
| | Multi-run with pass@k metrics |summarize
| | Derive compact views from results |calibrate
| | Sample failures for review |validate-refs
| | Check reference solutions |balance
| | Analyze test set coverage |schemas [name]
| | Export JSON schemas |headless --schema
| | Schema-driven adapter for any CLI agent |
| Command | Description |
|---------|-------------|
| run | Execute prompts, output raw results |extract
| | Parse raw output into trajectories |grade
| | Apply grader to extracted results |format
|