Evaluation suite for swarm-tools multi-agent coordination
npm install @swarmtools/evals```
š EVAL SUITE š
āāāāāāāāāāāāāāāāāāāāāāāāāā
Swarm Intelligence QA
Evaluation suite for swarm-tools multi-agent coordination. Uses Evalite to measure coordinator behavior, decomposition quality, and compaction correctness.
This package contains the evaluation framework for the swarm-tools ecosystem. Extracting evals into a separate package ensures:
1. Clean Dependencies - Main plugin doesn't need evalite/vitest in production
2. Faster Installs - Eval deps only needed for development/CI
3. Isolated Testing - Eval suite can evolve independently from plugin
- Coordinator Protocol - Does the coordinator spawn workers vs doing work itself?
- Coordinator Behavior - LLM behavior after compaction (stays in coordinator role)
- Compaction Resumption - Context injection correctness after compaction
- Compaction Prompt Quality - Quality of continuation prompts generated
- Task Decomposition - Quality of task splitting, file conflict detection
- Strategy Selection - Correct strategy choice for task characteristics
- Decision Quality - Strategy selection quality and precedent relevance
`bashRun all evals
bun run test
Package Structure
This package is part of the swarm-tools monorepo:
-
opencode-swarm-plugin - Main plugin (peer dependency)
- swarm-mail - Event sourcing primitives (peer dependency)
- @swarmtools/evals - This packageDevelopment
Evals use real coordinator sessions captured to
~/.config/swarm-tools/sessions/*.jsonl. See docs/README.md` in this package for details on session capture and eval architecture.MIT