The Golden Loop - Self-improving, cost-optimized autonomous development engine
npm install @oroboreo/cli
┌─────────────────────────────────────────┐
│ Layer 1: PLANNING (Opus 4.6) │
│ ├─ Generate comprehensive PRDs │
│ ├─ Tag task complexity [SIMPLE/COMPLEX] │
│ └─ Cost: $0.30-2 per PRD (one-time) │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Layer 2: ROUTING (Smart Selection) │
│ ├─ Parse PRD tasks │
│ ├─ Select model based on complexity │
│ └─ Optimize cost vs. quality │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Layer 3: EXECUTION (Claude Code) │
│ ├─ Full tool access (Read/Write/Edit) │
│ ├─ Runs smartly (70% cost savings) │
│ └─ Smart context (no amnesia) │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Layer 4: MEMORY (Persistent Learning) │
│ ├─ Archives completed sessions │
│ ├─ Extracts patterns and insights │
│ └─ Updates "AGENTS.md" automatically │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Layer 5: FEEDBACK (Self-Improvement) │
│ ├─ Analyzes historical performance │
│ ├─ Recommends optimizations │
│ └─ Improves future PRDs │
└─────────────────────────────────────────┘
`
---
💰 Cost Optimization
Oroboreo is designed for cost-effective autonomous development:
- No subscription fees - Pay only for AI compute you use
- Smart model routing - Uses Opus ($5/$25 per 1M tokens) for high-level planning, Sonnet ($3/$15) for complex implementation, Haiku ($1/$5) for simple tasks
- Multiple provider options - AWS Bedrock, Anthropic API, or (soon) Google Vertex AI
- Real-time cost tracking - Monitor spending per task in costs.json
- Full autonomy - Complete task loops with auto-retry, not just code suggestions
- Self-improving - Archives + feedback loops make it better with every session
Typical costs: $1-3 for a 12-task feature implementation
Example: User authentication feature with 12 tasks (4 simple + 8 complex) using smart routing typically costs $1-2 in API usage.
---
📋 Prerequisites
- Node.js 18+ installed
- Claude Code CLI: npm install -g @anthropic-ai/claude-code
- GitHub CLI (optional): Required for auto PR creation
- Install: https://cli.github.com/
- Without it, you'll see: ⚠️ GitHub CLI (gh) not installed. Skipping PR creation.
---
🚀 Installation
$3
`bash
Install globally
npm install -g @oroboreo/cli
Verify installation
oro-init --help
`
$3
`bash
Clone and copy to your project
git clone https://github.com/chemnm/oroboreo.git
cp -r oroboreo/ your-project/oroboreo/
`
---
🍪 The Core Commands
$3
`bash
Navigate to your project
cd your-project
Initialize Oroboreo (AI-powered or manual)
oro-init
- Discovers your project structure
- Creates creme-filling.md with Universal Laws
- Sets up .env for AWS Bedrock (optional AI analysis: ~$0.10-0.30)
`
$3
#### Option A: Generate NEW Feature Tasks
`bash
Use Opus 4.6 to generate comprehensive task breakdown
oro-generate "Add user authentication with JWT"
OR: Create new-prompt.md with detailed feature description, then:
oro-generate
Script will ask if you want to use new-prompt.md
File is archived and cleared after generation
`
- Opus creates detailed tasks in cookie-crumbs.md
- Tags tasks as [SIMPLE] or [COMPLEX]
- Supports inline, interactive, or file-based input
- Cost: ~$0.30-2.00 per PRD depending on the complexity of your project
#### Option B: Generate FIX Tasks from Feedback
`bash
1. Write issues you found during testing in human-feedback.md
2. Run the feedback architect
oro-feedback
`
- Opus analyzes your feedback + latest archive
- Creates fix tasks in cookie-crumbs.md
- Cost: ~$0.15-0.40 per feedback session
#### Option C: Manual Tasks
- Edit cookie-crumbs.md directly with your own tasks
$3
`bash
Run the Golden Loop
oro-run
`
- Auto-loops through all tasks in cookie-crumbs.md
- Smart model selection (Haiku for [SIMPLE], Sonnet for [COMPLEX])
- Cost tracking in costs.json
- Git commits on task completion
- Cost: ~$1-3 per 12-task feature
$3
`bash
Export cost data to CSV or compare with CloudWatch
oro-costs
`
$3
`bash
Post-mortem analysis for hung/failed tasks
oro-diagnose
`
- Analyzes execution logs from archived sessions
- Identifies timeout patterns and error causes
- Shows task duration, output silence periods, and failure reasons
- Helps debug overnight hangs or unexpected failures
---
🎬 Quick Start
- NPM Install (Recommended): QUICKSTART.md
- Manual Clone: QUICKSTART-CLONE.md
---
🏗️ Architecture
$3
`
your-project/
├── oroboreo/ # All Oroboreofiles live here
│ ├── cookie-crumbs.md # Task list (THE PLAN)
│ ├── creme-filling.md # System rules (THE LAW)
│ ├── progress.txt # Session memory (THE LEARNINGS)
│ ├── human-feedback.md # Your feedback input
│ ├── costs.json # Cost tracking
│ ├── .env # AWS credentials (from .env.example)
│ ├── .env.example # Template for credentials
│ ├── tests/ # Verification scripts
│ │ ├── README.md # Explains test organization
│ │ ├── reusable/ # Generic tests kept across sessions
│ │ │ ├── README.md # What makes a test reusable
│ │ │ ├── verify-auth.js # Example: Generic auth check
│ │ │ └── check-api.js # Example: API health check
│ │ └── verify-task-*.js # Session-specific tests (archived after)
│ ├── utils/
│ │ ├── oreo-config.js # Shared configuration (SINGLE SOURCE OF TRUTH)
│ │ ├── oreo-init.js # Project initialization (SETUP)
│ │ ├── oreo-run.js # Main execution loop (THE ENGINE)
│ │ ├── oreo-generate.js # NEW feature task generator (PLANNER)
│ │ ├── oreo-feedback.js # FIX task generator (ARCHITECT)
│ │ ├── oreo-archive.js # Session archival with smart test sorting (HISTORIAN)
│ │ ├── oreo-costs.js # Cost analysis & export (ACCOUNTANT)
│ │ ├── oreo-diagnose.js # Post-mortem analysis for hung tasks (DEBUGGER)
│ │ ├── install.js # Installation script
│ │ ├── run-with-prompt.bat # Execute Claude Code
│ │ └── run-with-bedrock.bat # Execute with Bedrock config
│ └── archives/ # Historical sessions (year/month organized)
│ └── 2026/ # Year folder
│ └── 01/ # Month folder
│ └── feature-name-2026-01-20-14-30/ # Session archive
│ ├── cookie-crumbs.md
│ ├── progress.txt
│ ├── costs.json
│ ├── oreo-execution.log # Full execution log
│ └── tests/ # Session-specific tests only
│ └── verify-task-*.js
├── src/ # Your project source
└── ...
`
$3
| File | Purpose |
|------|---------|
| utils/oreo-config.js | Shared configuration - model IDs, costs, paths (SINGLE SOURCE OF TRUTH) |
| utils/oreo-init.js | Initialize Oroboreoin a new project (AI-powered or manual) |
| utils/oreo-run.js | Main loop - executes tasks from cookie-crumbs.md |
| utils/oreo-generate.js | Generate tasks for NEW features (uses Opus 4.6) |
| utils/oreo-feedback.js | Generate FIX tasks from human feedback (uses Opus 4.6) |
| utils/oreo-archive.js | Archive completed sessions with year/month structure (HISTORIAN) |
| utils/oreo-costs.js | Export costs to CSV or compare with CloudWatch (ACCOUNTANT) |
| utils/oreo-diagnose.js | Post-mortem analysis for hung/failed tasks (DEBUGGER) |
| cookie-crumbs.md | Task list with checkboxes (like PRD.md) |
| creme-filling.md | System rules injected into every agent (like AGENTS.md) |
| progress.txt | Shared memory between agent instances |
| human-feedback.md | Where you describe issues for the feedback architect |
| costs.json | Real-time cost tracking per task |
| tests/ | Session-specific verification scripts (archived after session) |
| tests/reusable/ | Generic verification scripts (persist across sessions) |
$3
1. Initialize - Run oro-init to set up creme-filling.md (AI-powered or manual)
2. Plan Tasks - Choose your approach:
- NEW Feature: oro-generate "Add user authentication" (Opus generates fresh tasks)
- FIX Issues: Write issues in human-feedback.md → run oro-feedback (Opus analyzes archives + creates fix tasks)
- Manual: Write tasks directly in cookie-crumbs.md
3. Execute - oro-run loops through tasks:
- Parses next incomplete task - [ ]
- Selects model based on [SIMPLE]/[COMPLEX] tags (Haiku/Sonnet)
- Spawns Claude Code with Bedrock
- Tracks cost in costs.json
- Logs execution to oreo-execution.log
- Commits on completion
- Marks task - [x]
- 30-minute timeout with heartbeat logging
4. Archive - Completed sessions preserved in archives/YYYY/MM/session-name-timestamp/ for learning
5. Diagnose - If tasks hang/fail, run oro-diagnose on archived session for analysis
---
🔐 AI Provider Setup
Choose the one that fits your needs:
$3
> 📚 Official Guide: See the Claude Code Bedrock Documentation for detailed setup instructions.
Prerequisites:
- AWS Account with Bedrock access (us-east-1 region recommended)
- IAM user with bedrock:InvokeModel permission
- Claude models enabled (auto-enabled in most regions)
Configuration:
`bash
Copy the example and fill in your credentials
cd oroboreo
cp .env.example .env
Edit .env:
AI_PROVIDER=bedrock
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
`
$3
> 📚 Official Guide: See Claude in Microsoft Foundry for detailed setup.
Prerequisites:
- Active Azure subscription
- Azure AI Foundry resource created
- Claude model deployment (Opus, Sonnet, or Haiku)
Configuration:
`bash
Copy the example and fill in your credentials
cd oroboreo
cp .env.example .env
Edit .env:
AI_PROVIDER=foundry
ANTHROPIC_FOUNDRY_API_KEY=your-azure-api-key
ANTHROPIC_FOUNDRY_RESOURCE=your-foundry-resource-name
`
Setup Steps:
1. Go to Azure AI Foundry
2. Create or select a Foundry resource
3. Navigate to Models + endpoints → Deploy model → Deploy base model
4. Search for and deploy a Claude model (e.g., claude-sonnet-4-5)
5. Copy the API key from Keys and Endpoint section
$3
Prerequisites:
- Anthropic API account
- API key from https://console.anthropic.com/
Configuration:
`bash
Copy the example and fill in your credentials
cd oroboreo
cp .env.example .env
Edit .env:
AI_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
`
$3
Prerequisites:
- Claude Pro or Team subscription at https://claude.ai/
Configuration:
`bash
One-time login
npx @anthropic-ai/claude-code login
Edit .env:
AI_PROVIDER=subscription
No API key needed!
`
$3
Oroboreo automatically uses the correct model IDs based on your provider:
| Model | Cost (per 1M tokens) | Used For |
|-------|---------------------|----------|
| Opus 4.6 | $5/$25 | Architect (PRD generation) |
| Sonnet 4.5 | $3/$15 | Complex tasks [COMPLEX] |
| Haiku 4.5 | $1/$5 | Simple tasks [SIMPLE] |
Note: Costs are identical across all providers (Bedrock, Foundry, Anthropic API).
---
🧠 The Creme Filling (System Rules)
Every project has core constraints that should never be violated. Define these in creme-filling.md:
Examples:
- Never expose database credentials in frontend code
- All API routes must use authentication middleware
- Components must follow atomic design principles
- Database queries must use parameterized statements
These rules are injected into every Claude Code instance, ensuring consistent behavior across all tasks.
---
🌐 Use Cases
$3
- Build features 70% cheaper
- Focus on architecture, let Oroboreohandle implementation
- Learn from past sessions (what worked, what didn't)
$3
- Rapid prototyping with minimal AI costs
- Consistent code quality (Universal Laws)
- Built-in documentation (PRDs + archives)
$3
- Template per client (drop in, customize, run)
- Cost tracking for billing
- Historical performance data
$3
- Community contributors can generate PRDs
- Maintainers approve, Oroboreo executes
- Transparent cost tracking
---
🔮 Roadmap & Planned Features
$3
NPM Package Distribution
- [x] Publish to NPM as @oroboreo/cli
- [x] Global installation: npm install -g @oroboreo/cli
- [ ] NPX support: npx @oroboreo init, npx @oroboreo run
- [ ] Auto-update notifications for new versions
- [x] Semantic versioning and changelog automation
Why This Matters: Eliminate manual folder copying, make installation one command, standardize updates.
---
$3
Playwright Browser Automation
- [x] Built-in Playwright support for autonomous UI testing
- [x] Browser test utilities in tests/reusable/browser-utils.js
- [x] Auto-detect Playwright need and suggest installation
- [x] Console log capture and error detection
- [x] Screenshot evidence collection
- [ ] Visual regression testing (compare before/after screenshots)
- [ ] Accessibility testing integration (automated a11y checks)
- [ ] Video recording for debugging
Why This Matters: Eliminates manual UI testing by enabling Claude to verify its own changes. The agent can open browsers, click elements, test workflows, capture console errors, and collect evidence—all without human intervention. This closes the feedback loop and reduces dependency on human verification.
---
$3
VS Code Extension
- [ ] Right-click → "Run Oroboreo Task"
- [ ] Task list management from sidebar
- [ ] Real-time execution progress panel
- [ ] Cost tracker widget in status bar
- [ ] Archive browser for historical sessions
GitHub Actions Integration
- [ ] Workflow: Comment /oroboreo on issues/PRs
- [ ] Automated PR creation from completed sessions
- [ ] Branch protection integration (require approval before merge)
- [ ] Cost budgeting controls for CI/CD
MCP (Model Context Protocol) Server Integrations
- [ ] Built-in MCP server management (install, configure, enable/disable)
- [ ] Popular MCP server templates (filesystem, git, database, API tools)
- [ ] Auto-discovery of project-relevant MCP servers (e.g., detect Postgres → suggest database MCP)
- [ ] Session-scoped MCP server activation (enable specific tools per task)
- [ ] Cost tracking for MCP tool usage
Why MCP Matters: Extends Claude's capabilities beyond code editing to databases, APIs, cloud resources, and custom tooling—all through Anthropic's standardized protocol.
---
$3
Real-Time Improvements
- [ ] Streaming output during agent execution (no more waiting for task completion)
- [ ] Interactive prompts (agent asks clarifying questions mid-task)
- [ ] Task pause/resume functionality
- [ ] Parallel task execution (run multiple tasks concurrently)
Web UI Dashboard
- [ ] Cost analytics with charts (daily/weekly/monthly spend)
- [ ] Session explorer with search and filtering
- [ ] Task template library (share common workflows)
- [ ] Visual task breakdown editor (drag-and-drop PRD builder)
Testing & Quality Assurance
- [ ] Auto-generate verification tests from task descriptions
- [ ] Code review mode (post-session quality analysis)
- [ ] Security scanning integration (detect common vulnerabilities)
- [ ] Performance profiling for generated code
---
$3
Google Cloud Vertex AI Support
- [ ] Vertex AI provider option (Claude via Google Cloud)
- [ ] Unified credential management across AWS/GCP/Anthropic
- [ ] Cost comparison dashboard across providers
- [ ] Provider failover/redundancy (fallback to alternative if primary unavailable)
Why This Matters: Google Cloud users can access Claude through Vertex AI (via Anthropic partnership), providing alternative to AWS Bedrock or direct Anthropic API.
---
$3
Team Collaboration Mode
- [ ] Shared task queues (assign tasks to team members)
- [ ] Session replay (review how agent completed tasks)
- [ ] Approval workflows (review PRDs before execution)
- [ ] Cost allocation per team/project
Audit & Compliance
- [ ] Comprehensive audit logs (who ran what, when, and cost)
- [ ] SSO integration (Google, Okta, Azure AD)
- [ ] Role-based access control (dev, reviewer, admin)
- [ ] Compliance reports (SOC2, GDPR, HIPAA-friendly logging)
---
$3
Project Templates
- [ ] React/Next.js starter template
- [ ] Python/FastAPI template
- [ ] Go microservices template
- [ ] Rust CLI tool template
- [ ] Community-contributed templates marketplace
---
$3
⚠️ Architectural Rewrite Required
The current version is built on Claude Code, which is specifically designed for Anthropic's Claude models (Opus, Sonnet, Haiku). Supporting other model providers (OpenAI GPT, Google Gemini, local LLMs) would require:
1. Replace Claude Code with custom agent system
2. Abstract tool execution (file operations, bash, git) to work across models
3. Normalize model APIs (different providers use different formats)
4. Reimplement cost tracking (each provider has unique pricing/token counting)
5. Handle capability differences (not all models support extended thinking, tool use, etc.)
Planned (Post-1.0):
- [ ] Build unified agent abstraction layer
- [ ] Prototype with OpenAI + local LLM + Claude support
- [ ] Evaluate tradeoffs (complexity vs. flexibility)
- [ ] Community feedback on demand for multi-model support
Why Not Now? Maintaining quality for Claude models takes priority. Multi-model support would delay core features. I will revisit based on user demand.
---
🤝 Contributing
Obsess over building the future of autonomous development! Contributions welcome:
1. Fork the repo
2. Create feature branch (git checkout -b feature/amazing)
3. Commit changes (git commit -m 'Add amazing feature')
4. Push to branch (git push origin feature/amazing`)
🍪 Oroboreo Dev 🌀
The Golden Loop that gets better forever