Correctness-first AI coding orchestrator - Evidence-based diagnostics, WBS planning, and iterative execution
npm install milhouse-cli``bashInstall dependencies
pnpm install
$3
This project uses pnpm for package management and Bun for:
- Running TypeScript directly in development
- Building cross-platform binaries
- Running tests (bun:test)
$3
`bash
Build all platforms
pnpm buildBuild specific platform
pnpm build:linux
pnpm build:mac-arm
pnpm build:mac-x64
pnpm build:windows
`$3
`bash
npm install -g milhouse-cli
`Three Modes
$3
Just tell it what to do:
`bash
milhouse "add dark mode"
milhouse "fix the auth bug"
`$3
Work through a PRD:
`bash
milhouse # uses PRD.md
milhouse --prd tasks.md
`$3
Multi-agent investigation and execution:
`bash
milhouse --scan --scope "frontend zustand" # Creates isolated run
milhouse --validate # Validate issues
milhouse --plan # Generate tasks
milhouse --consolidate # Merge plans
milhouse --exec --exec-by-issue # Execute grouped by issue (recommended!)
milhouse --verify # Verify resultsOr run full pipeline (uses --exec-by-issue automatically)
milhouse --run
`Investigation Pipeline
6-phase pipeline with specialized AI agents:
| Phase | Agent | Description |
|-------|-------|-------------|
| scan | LI (Lead Investigator) | Scans codebase, identifies issues |
| validate | IV (Issue Validators) | Validates with probes |
| plan | PL (Planners) | Generates WBS per issue |
| consolidate | CO (Consolidator) | Merges into unified plan |
| exec | EX (Executors) | Executes tasks |
| verify | VE (Verifiers) | Runs verification gates |
$3
Each scan creates isolated state:
`bash
milhouse --scan --scope "frontend" # Creates run-abc
milhouse --scan --scope "backend" # Creates run-defmilhouse runs list # List all runs
milhouse runs switch run-abc # Switch active run
milhouse runs info # Show current run
milhouse runs delete run-def # Delete a run
`Project Config
Optional. Stores rules the AI must follow.
`bash
milhouse --init # auto-detects project settings
milhouse --config # view config
milhouse --add-rule "use TypeScript strict mode"
`Creates
.milhouse/config.yaml:
`yaml
project:
name: "my-app"
language: "TypeScript"
framework: "Next.js"commands:
test: "npm test"
lint: "npm run lint"
build: "npm run build"
rules:
- "use server actions not API routes"
- "follow error pattern in src/utils/errors.ts"
boundaries:
never_touch:
- "src/legacy/**"
- "*.lock"
`AI Engines
`bash
milhouse # Claude Code (default)
milhouse --opencode # OpenCode
milhouse --cursor # Cursor
milhouse --codex # Codex
milhouse --qwen # Qwen-Code
milhouse --droid # Factory Droid
`$3
`bash
milhouse --model sonnet "add feature" # use sonnet with Claude
milhouse --sonnet "add feature" # shortcut for above
milhouse --opencode --model opencode/glm-4.7-free "task"
`Task Sources
Markdown file (default):
`bash
milhouse --prd PRD.md
`Markdown folder (for large projects):
`bash
milhouse --prd ./prd/
`
Reads all .md files in the folder and aggregates tasks.YAML:
`bash
milhouse --yaml tasks.yaml
`GitHub Issues:
`bash
milhouse --github owner/repo
milhouse --github owner/repo --github-label "ready"
`Parallel Execution
$3
`bash
milhouse --exec --exec-by-issue # Each issue in its own worktree
milhouse --exec --exec-by-issue --max-parallel 3 # 3 issues in parallel
`How it works:
- Groups all tasks by their parent issue
- Each issue runs in an isolated worktree with a dedicated Claude agent
- Agent receives: issue details + validation report + WBS plan + all tasks
- Agent completes ALL tasks for that issue in one session
- Branches auto-merge back after completion
Benefits:
- Better context: Agent has full issue context, not just single task
- Fewer context switches: One agent handles related tasks together
- Faster overall: ~5 minutes per issue vs ~5 minutes per task
$3
`bash
milhouse --parallel # 3 agents default
milhouse --parallel --max-parallel 5 # 5 agents
`Each agent gets isolated worktree + branch. Without
--create-pr: auto-merges back with AI conflict resolution. With --create-pr: keeps branches, creates PRs. With --no-merge: keeps branches without merging.Branch Workflow
`bash
milhouse --branch-per-task # branch per task
milhouse --branch-per-task --create-pr # + create PRs
milhouse --branch-per-task --draft-pr # + draft PRs
`Browser Automation
Milhouse supports browser automation via agent-browser for testing web UIs.
`bash
milhouse "add login form" --browser # enable browser automation
milhouse "fix checkout" --no-browser # disable browser automation
`When enabled (and agent-browser is installed), the AI can:
- Open URLs and navigate pages
- Click elements and fill forms
- Take screenshots for verification
- Test web UI changes after implementation
Issue Filtering
Milhouse supports filtering issues by ID and severity level at any pipeline stage.
$3
`bash
Process only specific issues
milhouse --validate --issues P-xxx,P-yyy,P-zzzExclude specific issues
milhouse --plan --exclude-issues P-xxx
`$3
`bash
Process only CRITICAL and HIGH severity issues
milhouse --validate --severity CRITICAL,HIGHProcess issues with severity HIGH or above
milhouse --run --min-severity HIGH
`$3
Severity levels in order of priority:
1. CRITICAL - Highest priority
2. HIGH
3. MEDIUM
4. LOW - Lowest priority
$3
Filters can be combined (AND logic):
`bash
Validate specific issues that are also HIGH+ severity
milhouse --validate --issues P-xxx,P-yyy --min-severity HIGH
`Options
| Flag | What it does |
|------|--------------|
| Pipeline | |
|
--scan | Run Lead Investigator |
| --scope FOCUS | Focus scan on specific area |
| --validate | Validate issues with probes |
| --plan | Generate WBS |
| --consolidate | Merge into execution plan |
| --exec | Execute tasks |
| --verify | Run verification gates |
| --run | Run full pipeline |
| --resume | Resume from last phase |
| Issue Filtering | |
| --issues IDS | Comma-separated issue IDs to process |
| --exclude-issues IDS | Comma-separated issue IDs to exclude |
| --severity LEVELS | Filter by severity (CRITICAL,HIGH,MEDIUM,LOW) |
| --min-severity LEVEL | Minimum severity level to process |
| Tasks | |
| --prd PATH | task file or folder (auto-detected, default: PRD.md) |
| --yaml FILE | YAML task file |
| --github REPO | use GitHub issues |
| --github-label TAG | filter issues by label |
| Engine | |
| --model NAME | override model for any engine |
| --sonnet | shortcut for --claude --model sonnet |
| Execution | |
| --parallel | run tasks in parallel (legacy, per-task) |
| --exec-by-issue | execute tasks grouped by issue (recommended!) |
| --max-parallel N | max parallel agents/issues (default: 3) |
| --no-merge | skip auto-merge in parallel mode |
| --branch-per-task | branch per task |
| --base-branch BRANCH | base branch for PRs |
| --create-pr | create PRs |
| --draft-pr | draft PRs |
| --worktrees | force worktree isolation |
| --exec-fail-fast | stop on first task failure |
| Testing | |
| --no-tests | skip tests |
| --no-lint | skip lint |
| --fast | skip tests + lint |
| --no-commit | don't auto-commit |
| --browser | enable browser automation |
| --no-browser | disable browser automation |
| General | |
| --max-iterations N | stop after N tasks |
| --max-retries N | retries per task (default: 3) |
| --retry-delay N | delay between retries in seconds (default: 5) |
| --dry-run | preview only |
| -v, --verbose | debug output |
| --init | setup .milhouse/ config |
| --config | show config |
| --add-rule "rule" | add rule to config |Requirements
- Node.js 18+ or Bun
- AI CLI: Claude Code, OpenCode, Cursor, Codex, Qwen-Code, or Factory Droid
-
gh (optional, for GitHub issues / --create-pr`)MIT