Red64 Flow Orchestrator - Deterministic spec-driven development CLI
npm install red64-cli20 years of building products. 1+ year of AI-first development. Captured in a CLI.
TDD built in. Code smells to avoid. Documentation required. Quality gates enforced.
The process that turns AI code into production-ready software.
The result? Code that lives and evolvesβnot legacy the day it ships.




Quick Start Β· Why Red64 Β· Features Β· Documentation
---
I've spent 20 years building products and writing software. 30,000 hours of experience. Then I went all-in on AI coding tools:
They're incredible for building a feature. But then you start iteratingβand you hit a wall:
- β Code quality goes down the drain
- β No testing (or tests written after the fact)
- β No documentation/specs (good luck iterating on anything)
- β No careful design review, no code review
- β No quality gatesβcode smells everywhere
- β Large commits that can't be easily rolled back
- β No non-regression tests, so things start breaking
This is the same problem that arises in any team with no processes, no gates, no constraints.
The solution is what I've been doing for 20 years: Software Development Life Cycle and Processes. The stuff tech leaders and experience software professional implement in their teams. The stuff that separates "it works" from "it's maintainable."
Red64 CLI captures both:
1. My 30,000 hours of experience β code smells to avoid, patterns that scale, production wisdom
2. My process for working with AI β the SDLC that makes AI-generated code maintainable
The process (HOW the software professional works):
- Isolate every feature in a branch (git worktree)
- Write tests FIRST (TDD built in)
- Small atomic commits (one thing per commit)
- Document everything (REQUIREMENTS.md, DESIGN.md)
- High test coverage enforced
- Quality gates at every phase
The expertise (WHAT the software professional builds):
- Code smells to avoid (the stuff that breaks at 3 AM)
- Patterns and anti-patterns for Python, Next, Ruby, Rails etc...
- Stack-specific conventions (Next.js, Rails, FastAPI, etc.)
The result: Code that lives and evolves. We've rewritten features in another language in days because the documentation is so complete.
_Legend: work in parallel in two seperate worktrees, preview generated documentation with markdown and diagram rendering_
---
``bashInstall
npm install -g red64-cli
That's it. Red64 generates requirements β design β tests β implementation β documentation.
Each phase has review checkpoints. Each task = one clean commit. Tests first. Docs included.
---
π₯ YOLO Mode (No Babysitting)
Tired of approving every line?
`bash
red64 start "feature-name" "description" --sandbox -y
`-
--sandbox = Docker isolation (AI can't break your system, pulls image from ghcr.io/red64llc/red64-sandbox)
- -y = Auto-approve all phases (total autonomy)Start a feature. Go to lunch. Come back to a completed branchβwith tests, docs, and clean commits.
With other tools, YOLO mode means "write code fast with no oversight."
With Red64, autonomous mode means "follow the SDLC with no babysitting."
The AI still:
1. Writes tests FIRST (TDD enforced)
2. Documents everything (REQUIREMENTS.md, DESIGN.md)
3. Makes atomic commits (easy to review, easy to rollback)
4. Passes quality gates (no code smells ship)
Review the PR when it's done. Like a senior engineer delegating to a junior who's been properly onboarded.
---
π Battle-Tested
We built 6 production products with Red64 at red64.io/ventures:
| Company | Industry | Status |
|---------|----------|--------|
| Saife | InsurTech | Production |
| EngineValue | Engineering Scorecards | Production |
| MediaPulse | Digital Presence | Production |
| QueryVault | Data Platform | Production |
| Kafi (Internal product) | Virtual Executive Assistant | Production |
Same tool. Same encoded experience. Now open source.
---
π‘ Why Red64?
$3
I've spent 20 years building productsβ30,000 hours of learning what works and what breaks. Then I spent a year going all-in on AI coding tools.
The pattern is always the same:
1. Week 1: "This is amazing! I shipped a feature in a day!"
2. Week 4: "Why is everything breaking? Why is the code so messy?"
3. Week 8: "I'm afraid to touch anything. Time to rewrite."
The missing ingredient? SDLC. The stuff that takes 20 years to learn. The stuff I've been teaching engineers my entire career.
Red64 gives you both:
| What Goes Wrong Without SDLC | Red64 Solution |
|------------------------------|----------------|
| No tests β things break when you iterate | TDD built in (tests FIRST) |
| No docs β can't remember why anything works | REQUIREMENTS.md + DESIGN.md per feature |
| Huge commits β can't rollback, can't review | Atomic commits (one task = one commit) |
| No quality gates β code smells everywhere | Guardrails from 30K hours of experience |
| Babysitting every line β slow, exhausting | Autonomous mode with SDLC guardrails |
$3
`
feature-branch/
βββ REQUIREMENTS.md # What we're building and why
βββ DESIGN.md # How it works, architecture decisions
βββ TASKS.md # Atomic breakdown with acceptance criteria
βββ src/
β βββ feature.ts # Implementation
β βββ feature.test.ts # Tests (written first)
βββ docs/
βββ feature.md # User-facing documentation
`Every decision traceable. Every line has a reason. Code that survives iteration.
---
π Comparison
| Feature | Red64 | Cursor | Copilot | Claude Code | Gemini CLI | Aider |
|---------|:-----:|:------:|:-------:|:-----------:|:----------:|:-----:|
| 30K hours expertise encoded | β
| β | β | β | β | β |
| SDLC/Process enforced | β
| β | β | β | β | β |
| Autonomous mode | β
| β | β | β | β | β |
| Sandboxed execution | β
| β | β | β
| β
| β |
| MCP support | β
| β
| β
| β
| β
| β |
| TDD enforced (tests first) | β
| β | β | β | β | β |
| High coverage enforced | β
| β | β | β | β | β |
| Auto-generates docs | β
| β | β | β | β | β |
| Git worktree isolation | β
| β | β | β | β | β |
| Atomic commits enforced | β
| β | β | β | β | β
|
| Phase gates with review | β
| β | β | β | β | β |
| Code smell guardrails | β
| β | β | β | β | β |
| Resumable multi-step flows | β
| β | β | β | β | β οΈ |
| Multi-model support | β
| β
| β
| β | β | β
|
| Battle-tested (production) | β
6 cos | N/A | N/A | N/A | N/A | N/A |
Key: β
= Built-in & enforced | β οΈ = Partial/Optional | β = Not available
> The difference: Other tools have autonomous modes. Red64 has autonomous mode plus the encoded expertise and enforced process that produces production-quality code.
$3
β
Use Red64 when:
- Building complete features (not quick fixes)
- You want code with tests, docs, and clean history
- You need to walk away and let AI work autonomously
- You're tired of babysitting every line
- You want code that's safe to refactor
β Use other tools when:
- Making quick, single-file edits
- You want real-time IDE autocomplete
- Exploring or prototyping ideas
---
β‘ Features
$3
Use your preferred AI:
`bash
red64 init --agent claude # Default
red64 init --agent gemini # Google Gemini
red64 init --agent codex # OpenAI Codex
`$3
Interrupted? Just run
start again:`bash
red64 start "shopping-cart" "..."
Detects in-progress flow, offers to resume
`$3
Configure MCP servers once, and Red64 automatically injects them into whichever agent you use (Claude, Gemini, or Codex):
`bash
Add an MCP server
red64 mcp add context7 npx -y @upstash/context7-mcpList configured servers
red64 mcp listRemove a server
red64 mcp remove context7
`MCP servers are stored in
.red64/config.json and translated into each agent's native config format before invocation. Configs are cleaned up after execution so your personal agent settings stay untouched.Works in both local and
--sandbox mode (stdio servers run inside the container).$3
Please note that Playright's capabilities are already included in the Docker Image via Vercel's AI-native browser automation CLI. There's no need to add Playrigh MCP when running in Sanbox mode. $3
Customize AI behavior in
.red64/steering/:- product.md β Product vision, user personas
- tech.md β Stack standards, code smells to avoid
- structure.md β Codebase organization
---
π Documentation
- Full Documentation
- Steering Document Guide
- Configuration Reference
- Troubleshooting
---
π Commands
`bash
red64 init --agent gemini # Initialize Red64 in your project
red64 start # Start a new feature
red64 start ... --sandbox -y # YOLO mode (autonomous)
red64 status [feature] # Check flow status
red64 list # List all active flows
red64 abort # Abort and clean up
red64 mcp list # List configured MCP servers
red64 mcp add # Add an MCP server
red64 mcp remove # Remove an MCP server
`$3
| Flag | Description |
|------|-------------|
|
-y, --yes | Auto-approve all phases (YOLO mode) |
| --sandbox | Run in Docker isolation (uses GHCR image by default) |
| --local-image | Build and use local sandbox image instead of GHCR (init only) |
| -m, --model | Override AI model |
| -a, --agent | Set coding agent (claude/gemini/codex) |
| --verbose | Show detailed logs |---
π€ Contributing
We'd love your help encoding more production wisdom:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests:
npm test`What we're looking for:
- More code smells to catch
- Stack-specific best practices
- Bug fixes and improvements
---
MIT β Built by Yacin Bahi at Red64.io
---
β Star this repo if you believe AI should write code like a senior engineer.