AI-powered development framework for Claude Code - multi-agent coordination, specifications, and quality gates
npm install anvil-dev-framework``
___ _ ___ _____ _
/ \ | \ | \ \ / /_ _| |
/ /_\ \ | \| |\ \ / / | || | v0.1.9.0 (alpha)
/ _____ \| |\ | \ V / | || |___
/_/ \_\_| \_| \_/ |___|_____|
══════════════════════════════════════════════════════════
Where raw specs are forged into production code.
══════════════════════════════════════════════════════════
`
> A structured AI development system for solo builders who demand production-quality output.
Anvil is a comprehensive framework for AI-assisted software development that combines phase-gated workflows, persistent memory systems, and automated quality gates to transform how you build software with AI coding assistants.
---
Released: 2026-01-17
- Ralph Visibility & Notification System (ANV-298) — Real-time monitoring for autonomous execution
- Live terminal watcher (ralph-watch) with progress bars and event stream--enable/--disable {all,macos,tts,slack,discord}
- macOS Notification Center and TTS announcements for milestones
- Slack/Discord webhook integrations for team visibility
- REST API with SSE for future GUI integration
- Toggle notifications: /efficiency
- Token Efficiency Audit Framework — Complete token consumption tracking and optimization
- command for historical analysis with weekly/monthly reports/token-budget
- command for session budget management with alerts.coderabbit.yaml
- CodeRabbit Deep Integration — Automated code review workflow
- Enhanced with pre-merge checks and custom Anvil validations
See CHANGELOG.md for complete history.
> Note: Version numbers were reset in January 2026 from 1.x to 0.1.x to accurately reflect alpha status. See Versioning Strategy for details.
---
AI coding assistants are powerful but chaotic:
``
❌ Context lost between sessions
❌ No memory of what was decided or why
❌ Agents run off in wrong directions
❌ Quality varies wildly
❌ No structured workflow
❌ Duplicate work, missed patterns
❌ "It works on my machine" PRs
``
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ 📋 SPECS 🧠 MEMORY 🚦 GATES 🔄 FLOW │
│ ──────── ──────── ─────── ────── │
│ What to build What happened Quality checks How to work │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ EXPLORE → SPECIFY → PLAN → TASKS → IMPLEMENT → VERIFY → ✓ │ │
│ │ │ │ │ │ │ │ │ │
│ │ └─────────┴────────┴───────┴─────────┴──────────┘ │ │
│ │ Human gates at each phase │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ✅ Context preserved across sessions │
│ ✅ Structured specs prevent scope creep │
│ ✅ Phase gates catch problems early │
│ ✅ Evidence-based PR completion │
│ ✅ Memory that actually persists │
│ │
└─────────────────────────────────────────────────────────────────────────┘
---
Anvil is for you if:
- Solo developer or small team (1-3 people)
- Building production software with AI assistance
- Want structured workflows, not just chat
- Value quality gates and evidence-based PRs
- Need context to persist across sessions
Anvil is NOT for:
- Large teams with existing robust processes
- Quick prototypes / throwaway code
- Those who prefer unstructured AI interaction
Architecture Philosophy:
- Single generalist agent with on-demand skills (not multi-agent)
- Skills = Domain knowledge loaded when needed
- Sub-agents = Focused multi-step workflows
- Coordination = For parallel Claude terminals, not agent roles
---
``
┌─────────────────────────────────────────────────────────────────────────┐
│ ANVIL ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ SINGLE GENERALIST AGENT │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Skills │ │ Sub-Agents │ │ Context │ │ │
│ │ │ (on-demand) │ │ (read-only) │ │ (tiers) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┼───────────────────────────────────────┐ │
│ │ MEMORY SYSTEM │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Spec │ │ Task │ │ Session │ │ Handoff │ │Convention│ │ │
│ │ │ Memory │ │ Memory │ │ Memory │ │ Memory │ │ Memory │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┼───────────────────────────────────────┐ │
│ │ QUALITY GATES │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Pre-Work │ │ During │ │ Pre-PR │ │ Post-PR │ │ │
│ │ │ Gate │ │ Gate │ │ Gate │ │ Gate │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
---
Option 1: Bun (Recommended for Claude Code users)
`bash`
bun install -g anvil-dev-framework
anvil init
Option 2: npm
`bash`
npm install -g anvil-dev-framework
anvil init
Option 3: From Source
`bash`
git clone https://github.com/AMPMIO/anvil-dev-framework.git
cd anvil-dev-framework
./scripts/install.sh # Auto-configures PATH
anvil init
See docs/INSTALLATION.md for complete installation guide.
Templates available:
`bash`
anvil init # Generic project
anvil init --template saas # SaaS project
anvil init --template api-python # Python API project
anvil init --with-linear # With Linear integration
anvil init --dry-run # Preview changes
See docs/anvil-init.md for complete anvil init documentation.
After making changes to the framework, sync to your projects:
`bashSync global config
./scripts/sync.sh --global
See docs/sync.md for complete sync documentation.
$3
After installation, create your first skill:
See docs/FIRST-SKILL-TUTORIAL.md
$3
`
/orient → /sprint → /validate → [work] → /evidence → /handoff
`For new features, add:
`
/explore → /spec → /plan → /tasks → [implement]
`See Session Workflow Guide for the complete step-by-step walkthrough.
---
📁 Framework Structure
`
anvil-dev-framework/
│
├── global/ # → Installs to ~/.claude/
│ ├── CLAUDE.md # Personal defaults & preferences
│ ├── standards/ # Universal coding standards
│ │ ├── typescript.md
│ │ ├── react.md
│ │ ├── testing.md
│ │ └── security.md
│ ├── templates/ # Reusable spec templates
│ │ ├── feature-spec.md
│ │ ├── bug-fix-spec.md
│ │ └── refactor-spec.md
│ ├── commands/ # Global slash commands
│ │ ├── orient.md
│ │ ├── ready.md
│ │ ├── validate.md
│ │ ├── evidence.md
│ │ ├── shard.md
│ │ └── decay-review.md
│ ├── skills/ # Claude Code skills
│ └── analytics/ # Metrics & reports
│
├── project/ # → Installs to .claude/
│ ├── CLAUDE.md.template # Project context (customize)
│ ├── constitution.md.template # Non-negotiable principles
│ ├── product.md.template # Mission & roadmap
│ ├── commands/ # Project-specific commands
│ │ ├── explore.md
│ │ ├── spec.md
│ │ ├── plan.md
│ │ ├── tasks.md
│ │ ├── discover.md
│ │ ├── change.md
│ │ └── handoff.md
│ ├── specs/ # Specifications
│ │ ├── current/
│ │ └── archive/
│ ├── changes/ # Brownfield change proposals
│ ├── handoffs/ # Session continuity docs
│ └── examples/ # Convention examples
│
├── quality-gates/ # → Installs to project root
│ ├── .coderabbit.yaml # AI code review config
│ ├── .semgrep/ # SAST rules
│ ├── .pre-commit-config.yaml # Pre-commit hooks
│ └── .github/workflows/ci.yaml # CI pipeline
│
├── docs/ # Documentation
│ ├── research/ # Research reports
│ │ └── v5-research-report.md
│ ├── patterns/ # Pattern explanations
│ ├── implementation-guide.md # Step-by-step setup
│ └── command-reference.md # All commands documented
│
├── scripts/ # Automation
│ ├── install.sh # Fresh install to ~/.claude/
│ ├── init-project.sh # Initialize in project
│ ├── sync.sh # Sync framework updates
│ └── rollback.sh # Rollback changes
│
└── examples/ # Reference implementations
└── baby-gift-garden/
`---
🧠 Core Concepts
$3
Every feature flows through structured phases with human checkpoints:
`
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ EXPLORE │───▶│ SPECIFY │───▶│ PLAN │───▶│ TASKS │
│ │ │ │ │ │ │ │
│ /explore │ │ /spec │ │ /plan │ │ /tasks │
└──────────┘ └────┬─────┘ └────┬─────┘ └──────────┘
│ │ │
▼ ▼ ▼
[APPROVE] [APPROVE] [IMPLEMENT]
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ VERIFY │
│ │ │ │
│ │ │/evidence │
│ │ └────┬─────┘
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ ARCHIVE │
│ │ │ │
│ │ │ Done! │
│ │ └──────────┘
`$3
`
┌─────────────────────────────────────────────────────────────────────┐
│ MEMORY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: SPECIFICATION MEMORY │
│ ───────────────────────────── │
│ Location: .claude/specs/ │
│ Purpose: What SHOULD be built │
│ Decay: Archive when feature complete │
│ │
│ Layer 2: TASK MEMORY │
│ ──────────────────── │
│ Location: Linear (via MCP or CLI) │
│ Purpose: What needs to be DONE │
│ Decay: Archive closed >30 days │
│ │
│ Layer 3: SESSION MEMORY │
│ ─────────────────────── │
│ Location: Claude-Mem observations │
│ Purpose: What HAPPENED in past sessions │
│ Decay: Auto-compress to ~500 tokens/observation │
│ │
│ Layer 4: HANDOFF MEMORY │
│ ────────────────────── │
│ Location: .claude/handoffs/ │
│ Purpose: Where we LEFT OFF │
│ Decay: Keep last 5-7, archive older │
│ │
│ Layer 5: CONVENTION MEMORY │
│ ───────────────────────── │
│ Location: CLAUDE.md + Skills │
│ Purpose: HOW to build │
│ Decay: Rarely (evolve instead) │
│ │
└─────────────────────────────────────────────────────────────────────┘
`$3
`
┌─────────────────────────────────────────────────────────────────────┐
│ CONTEXT HIERARCHY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ TIER 1: GLOBAL (~/.claude/) │
│ ─────────────────────────── │
│ • Personal preferences │
│ • Universal standards │
│ • Reusable templates │
│ Lifetime: Permanent │
│ │
│ TIER 2: PROJECT (.claude/) │
│ ────────────────────────── │
│ • Project-specific CLAUDE.md │
│ • Constitution (non-negotiables) │
│ • Product definition │
│ Lifetime: Project duration │
│ │
│ TIER 3: FEATURE (.claude/specs/) │
│ ───────────────────────────────── │
│ • Feature specifications │
│ • Implementation plans │
│ • Change proposals │
│ Lifetime: Feature duration → archive │
│ │
└─────────────────────────────────────────────────────────────────────┘
`$3
`
┌─────────────────────────────────────────────────────────────────────┐
│ QUALITY GATES │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ PRE-WORK GATE (/validate) │
│ ───────────────────────── │
│ ✓ Git status clean │
│ ✓ On feature branch (not main) │
│ ✓ Dependencies installed (npm ci) │
│ ✓ Tests passing (baseline) │
│ ✓ Types passing (no errors) │
│ │
│ DURING-WORK GATE │
│ ──────────────── │
│ ✓ Read before write (cite files) │
│ ✓ Follow conventions (check examples) │
│ ✓ File discovered work immediately │
│ ✓ Update Linear status │
│ │
│ PRE-PR GATE (/evidence) │
│ ─────────────────────── │
│ ✓ Lint passes (full output) │
│ ✓ Types pass (full output) │
│ ✓ Tests pass (full output) │
│ ✓ Only expected files changed │
│ ✓ Evidence captured in PR │
│ │
│ POST-PR GATE │
│ ──────────── │
│ ✓ CodeRabbit review │
│ ✓ Semgrep security scan │
│ ✓ Human review │
│ ✓ CI pipeline passes │
│ │
└─────────────────────────────────────────────────────────────────────┘
`---
📋 Command Reference
$3
| Command | Purpose | When to Use |
|---------|---------|-------------|
|
/orient | Session startup orientation | Start of every session |
| /ready | Calculate ready work (no blockers) | Before selecting a task |
| /handoff | Generate session continuity doc | End of every session |$3
| Command | Purpose | When to Use |
|---------|---------|-------------|
|
/explore | Discovery phase | Before any new feature |
| /spec | Generate specification | After exploration approved |
| /plan | Create implementation plan | After spec approved |
| /tasks | Break plan into Linear issues | After plan approved |
| /change | Create brownfield change proposal | Modifying existing features |$3
| Command | Purpose | When to Use |
|---------|---------|-------------|
|
/validate | Environment validation | Before any code changes |
| /evidence | Capture quality gate proof | Before creating PR |
| /discover | File discovered work | During implementation |$3
| Command | Purpose | When to Use |
|---------|---------|-------------|
|
/anvil-sync | Sync framework updates | After pulling framework changes |
| /shard | Break large specs into pieces | When specs exceed 2000 tokens |
| /decay-review | Archive old issues/handoffs | Weekly maintenance |
| /weekly-review | Generate analytics report | Weekly review |$3
| Command | Purpose | When to Use |
|---------|---------|-------------|
|
/ralph start | Initialize autonomous execution | Large refactoring, overnight runs |
| /ralph status | Check iteration progress | Monitor unattended execution |
| /ralph stop | Gracefully terminate loop | End autonomous session early |> Note: Ralph Wiggum mode is a specialized power tool for specific scenarios (large refactoring, framework migrations, TDD with clear specs). It is NOT part of the standard daily workflow. See When to Use Ralph below.
---
🤖 When to Use Ralph Wiggum Mode
Ralph Wiggum is a specialized power tool for autonomous, long-running AI execution — NOT a replacement for the standard workflow.
$3
`
/orient → /sprint → /validate → [work] → /evidence → /handoff
`This remains your default approach for all normal development work.
$3
`bash
Manual task description
/ralph start "Migrate all tests from Jest to Vitest" --max-iterations 50From Linear issue (recommended) - fetches subtasks automatically
/ralph start --issue ANV-209From Linear project - process all issues in a project
/ralph start --project "HUD Development"
`Linear Integration Flags:
| Flag | Description |
|------|-------------|
|
--issue | Linear issue ID to fetch subtasks from (e.g., ANV-209) |
| --project | Linear project name to process all issues |
| --subtasks | Filter subtasks (e.g., ANV-1..ANV-5 or ANV-1,ANV-3) |
| --include-done | Include already-completed issues in project mode |
| --no-sync | Disable syncing status back to Linear || Good For | Not Good For |
|----------|--------------|
| ✅ Large-scale refactoring with clear completion criteria | ❌ Exploratory work (figuring things out) |
| ✅ Framework migrations (Jest→Vitest, CJS→ESM) | ❌ Ambiguous requirements |
| ✅ TDD with clear failing tests to pass | ❌ Security-sensitive code |
| ✅ Greenfield projects with detailed specs | ❌ Architecture decisions needing judgment |
| ✅ Test coverage expansion across many files | ❌ Quick fixes (overkill) |
| ✅ Overnight/unattended execution (8+ hours) | ❌ Interactive debugging |
$3
| Scenario | Estimated Cost |
|----------|----------------|
| 10 iterations, small codebase | $5-15 |
| 50 iterations, medium codebase | $50-100+ |
| 100+ iterations, large codebase | $200+ |
Recommendation: Start with
--max-iterations 10 to understand costs before running overnight.$3
Watch Ralph progress in real-time with the visibility tools:
`bash
Terminal 1: Run Ralph
/ralph start --issue ANV-209Terminal 2: Watch progress (full display)
python3 global/tools/ralph-watchOr compact single-line mode
python3 global/tools/ralph-watch --compact
`Notification Options:
`bash
Toggle notifications
python3 global/lib/ralph_notifier.py --disable tts # Mute TTS
python3 global/lib/ralph_notifier.py --enable macos # Enable desktopStart API server for external monitoring
python3 global/api/ralph_api.py --port 8765
`Event Types:
-
session_started — Ralph begins work
- subtask_complete — Linear subtask finished
- session_complete — All work done
- error_occurred — Something went wrong
- circuit_breaker — No file changes detected (stuck)See
global/tools/README.md for full documentation.$3
`
Is this task...
├── Quick fix or bug? → Standard workflow
├── Exploratory / unclear? → Standard workflow
├── Needs human decisions? → Standard workflow
├── Large with clear spec? → Consider Ralph
├── Migration / refactoring? → Consider Ralph
└── Overnight / unattended? → Ralph is ideal
`---
🔬 Research Foundation
Anvil is built on extensive research across 15+ systems with 200k+ combined GitHub stars:
| Source | Key Pattern Extracted |
|--------|----------------------|
| Factory Droid | 6 structural guardrails (58.8% Terminal-Bench vs 43.2% baseline) |
| Beads (4.2k ⭐) | Task memory patterns, ready work calculation |
| BMAD (25.2k ⭐) | Document sharding, phase gates |
| SpecKit (55.6k ⭐) | Constitution pattern, structured specs |
| OpenSpec (12k ⭐) | Brownfield change tracking, Gherkin scenarios |
| Agent OS (2.7k ⭐) | Three-tier context hierarchy |
| Claude-Mem | Session compression, semantic search |
| Prompt Coach | Prompt analytics, time-lost calculations |
$3
1. Agent design matters more than model choice — Droid achieved 58.8% vs Claude Code's 43.2% on Terminal-Bench using the same model
2. Single agent with skills beats multi-agent — Coordination overhead typically 40-60% of cycles
3. Phase gates prevent disasters — Universal across all production systems studied
4. Identity claims hurt performance — "Idiot" persona outperformed "Genius" by 2.2% on MMLU
See docs/research/v5-research-report.md for the complete analysis.
---
💰 Cost Structure
| Component | Cost | Purpose |
|-----------|------|---------|
| Claude Code Max | $200/mo | AI coding assistant |
| CodeRabbit | $25/mo | AI code review |
| Semgrep OSS | Free | Security scanning |
| Trivy | Free | Vulnerability scanning |
| Gitleaks | Free | Secrets detection |
| Pre-commit | Free | Git hooks |
| Total | $225/mo | |
---
📈 Success Metrics
$3
| Metric | Target |
|--------|--------|
| Orientation time | < 2 minutes |
| Ready work accuracy | > 95% |
| Phase gate compliance | 100% |
| Handoff coverage | 100% |
$3
| Metric | Target |
|--------|--------|
| First-pass PR approval | > 70% |
| Clarification rate | < 15% |
| Security findings per PR | < 2 high |
| Discovery completion | > 80% within 2 weeks |
---
🗺️ Roadmap
$3
Anvil uses four-part versioning:
MILESTONE.MAJOR.MINOR.PATCH| Component | Meaning |
|-----------|---------|
| MILESTONE | 0 = Alpha (pre-1.0), 1 = Production-ready |
| MAJOR | Significant feature sets or breaking changes |
| MINOR | New features |
| PATCH | Bug fixes |
---
$3
| Requirement | Status | Notes |
|-------------|--------|-------|
| HUD/TUI fully implemented | ✅ Complete | Multi-agent terminal dashboard with 6 panels |
| External user testing | ⏳ Not Started | Beta testers outside core team |
| One-command installation | ✅ Complete |
anvil init with templates |
| All core commands documented | ✅ Complete | 13 commands with examples |
| No known critical bugs | 🔄 Ongoing | Continuous improvement |
| Cross-platform testing | ⏳ Not Started | macOS, Linux, WSL |
| Monetization/licensing defined | ⏳ Not Started | Pricing and distribution model |---
$3
- [x] HUD v2 Multi-Agent Command Center (ANV-78)
- [x] HUD Kanban Panel (ANV-76)
- [x] Provider Pattern for issue tracking (Linear + Local)
- [x] Local JSON issue tracker for non-Linear users
- [x] Cost Tracker, Context Health, Task Status panels
- [x] Quality Gates, Coordination panels
- [x] GitHub/CI and CodeRabbit integration
- [x] HUD Configuration system
- [x] Documentation for Local Issue Tracking System (ANV-109)
- [ ] Unified /orient, /sprint, /ready commands$3
#### v0.1.4.0 (Current)
- [x] Statusline configuration (full/minimal/off variants)
- [x]
/release command for version coordination
- [x] Enhanced /evidence and /handoff commands
- [x] Code review config integration#### v0.1.3.0
- [x] Framework Healthcheck System (ANV-17)
#### v0.1.2.0
- [x] Use-case based templates (
saas, api-python, generic)
- [x] Next.js + Supabase + Vercel template
- [x] FastAPI + PostgreSQL + pytest template#### v0.1.1.0
- [x] CLI tool (
anvil init command)
- [x] Project initialization with templates
- [x] Granular hook control (--no-tts, --no-memory)
- [x] Linear sub-issue support#### v0.1.0.0
- [x] Core framework structure
- [x] Phase-gated workflow commands
- [x] Memory system architecture
- [x] Quality gate configurations
- [x] Documentation
---
$3
- [ ] Homebrew CLI distribution (macOS/Linux)
- [ ] Additional templates (mobile, Rails, Go)
- [ ] VS Code extension
- [ ] Dashboard for metrics
- [ ] Team collaboration features---
📚 Documentation
| Document | Description |
|----------|-------------|
| System Architecture | OVERVIEW — How Linear + CodeRabbit + Claude Code + Memory integrate |
| Session Workflow | START HERE — Daily coding workflow |
| Local Issue Tracking | File-based issues without Linear |
| Sync Guide | Keep projects updated with framework changes |
| Installation Guide | Initial framework setup |
| Implementation Guide | How to set up Anvil |
| Command Reference | All commands detailed |
| Planning Responsibilities | Who decides what |
| Simplification Principles | Framework simplicity guidance |
| Simplification Plan Template | Audit and simplify checklist |
| Research Report | Full research analysis |
| Pattern Catalog | Pattern explanations |
---
⚖️ License
Proprietary — All rights reserved.
This framework is not open source. Contact for licensing inquiries.
---
🤝 Contact
For licensing, questions, or collaboration:
- Author: Alex Cahiz
- Project: Anvil Development Framework
---
`
══════════════════════════════════════════════════════════
Built with 🔥 for developers who refuse to compromise.
══════════════════════════════════════════════════════════
``