Ophan - Self-improving dev agent

A self-improving AI development agent based on the Two-Loop Paradigm

Quick Start •
How It Works •
Documentation

---

> "Ophan" references the biblical Ophanim—"wheels within wheels" from Ezekiel's vision—representing nested loops that autonomously adapt while observing and learning continuously.

The Two-Loop Paradigm

Most AI coding agents plateau: they fix bugs and generate code but make the same mistakes repeatedly. Ophan solves this by separating Guidelines (how to work) from Criteria (what good looks like).

$3

1. Generate output from Guidelines
2. Evaluate against Criteria + dev tools
3. Learn from evaluation results
4. Regenerate with improved understanding

$3

1. Gather converged outputs from many tasks
2. Analyze for patterns and learnings
3. Propose updates to Guidelines and Criteria
4. Apply (Guidelines automatically, Criteria with human approval)

Quick Start

``bash

`Install`


npm install -g ophan
Initialize in your project

ophan init
Run a task

ophan task "fix the login validation bug"
Check status

ophan status
Run outer loop review

ophan review
View recent task logs

ophan logs
Open web UI

ophan ui


Project Structure

After running ophan init, your project will have:

`my-project/ ├── OPHAN.md # Agent entry point ├── .ophan.yaml # Configuration ├── .ophan/ │ ├── guidelines/ # Agent CAN edit │ │ ├── coding.md │ │ ├── testing.md │ │ ├── context.md # Context compilation patterns │ │ └── learnings.md │ ├── criteria/ # Agent CANNOT edit (protected) │ │ ├── quality.md │ │ ├── security.md │ │ └── context-quality.md # Context agent metrics │ ├── logs/ # Task execution logs │ ├── context-logs/ # Context usage logs │ ├── digests/ # Outer loop reports │ └── state.json # Runtime state └── [your project files]`

`Key Concepts`

`$3`


- Workflows and decision trees
- Data structures and templates
- Constraints and failure detection
- Agent can freely update these based on learnings
$3

- Evaluation standards
- Analytical methods
- Comparative context
- Failure patterns
- Only humans can approve changes (prevents reward hacking)
$3

The outer loop requires human oversight to approve criteria changes. This prevents the agent from lowering its own standards to achieve easier "success."
Configuration

See .ophan.yaml for all options:

`yaml model: name: claude-sonnet-4-20250514 maxTokens: 4096

innerLoop: maxIterations: 5 regenerationStrategy: informed # full | informed | incremental costLimit: 0.50

outerLoop: triggers: afterTasks: 10 minOccurrences: 3 minConfidence: 0.7 lookbackDays: 30 learnings: maxCount: 50 retentionDays: 90 promotionThreshold: 3 similarityThreshold: 0.9

escalations: webhooks: - name: slack-alerts url: ${SLACK_WEBHOOK_URL} events: [escalation, digest]`

`Commands`

| Command | Description | |---------|-------------| |ophan init| Initialize Ophan in current project | |ophan task ""| Run a task through inner loop | |ophan review| Run outer loop (pattern detection) | |ophan status| Show metrics and status | |ophan logs| View recent task logs | |ophan ui| Open web UI for configuration and monitoring | |ophan approve | Approve a criteria change proposal | |ophan context-stats | View context usage statistics |

`$3`

ophan init--t, --template — Template to use (base, typescript, python) --f, --force— Overwrite existing configuration --y, --yes— Skip confirmation prompts --p, --project — Path to the project directory

ophan task--n, --dry-run— Show what would be done without executing --m, --max-iterations — Override max iterations --p, --project — Path to the project directory

ophan review--f, --force— Run even if task threshold not reached ---auto— Auto-approve guideline changes (criteria still require approval) ---non-interactive— Skip interactive review, save proposals to pending ---pending— Review pending proposals from previous runs --p, --project — Path to the project directory

ophan context-stats--d, --days — Number of days to analyze (default: 30) ---json— Output as JSON --p, --project — Path to the project directory

ophan logs--l, --limit — Number of logs to show (default: 10) --p, --project — Path to the project directory ---json — Output as JSON

ophan ui--p, --port — Port to run the server on (default: 4040) ---no-open— Do not open browser automatically ---project — Path to the project directory

`Web UI`

Ophan includes a lightweight web dashboard for viewing status and editing configuration.

`bash

`Start the UI (opens browser automatically)`


ophan ui
Start on a different port

ophan ui --port 8080
Start without opening browser

ophan ui --no-open


The UI provides:
- Dashboard: View task metrics, success rates, and costs
- Task Logs: Browse and search task execution history
- Configuration: Edit settings with form-based interface
- Guidelines/Criteria: View current guidelines and criteria files
- Digests: Read outer loop review reports
Escalations & Webhooks
Ophan can send notifications when tasks escalate (hit max iterations, exceed cost limits, etc.) or when outer loop digests are generated.
$3

-

escalation

 — Task failed to converge
-

task_complete

 — Task finished (success or failure)
-

digest

 — Outer loop review completed
$3

json
{
  "type": "escalation",
  "timestamp": "2024-01-15T10:30:00Z",
  "task": {
    "id": "task-20240115-103000-abc1",
    "description": "fix the login bug",
    "iterations": 5,
    "maxIterations": 5
  },
  "reason": "max_iterations",
  "context": {
    "lastError": "Test failed: expected 200, got 401",
    "suggestedAction": "Review task complexity or improve guidelines"
  },
  "project": {
    "name": "my-app",
    "path": "/Users/dev/my-app"
  }
}

$3

Use ${VAR_NAME} syntax for secrets:

`yaml escalations: webhooks: - name: slack url: ${SLACK_WEBHOOK_URL} headers: Authorization: Bearer ${AUTH_TOKEN} events: [escalation]`

`Prerequisites`

Ophan uses Claude Code (subscription-based) for task execution:

1. Install Claude Code CLI 2. Authenticate with your Claude subscription

| Variable | Required | Description | |----------|----------|-------------| | Webhook URLs/tokens | No | As configured in.ophan.yaml |

`How It Works`

`$3`

Ophan includes a self-improving context agent that learns which files are relevant for different tasks. After each task, it tracks:

- Hit Rate: % of provided files that were actually used (target: >70%) - Miss Rate: % of used files that weren't provided (target: <20%)

Over time, the context agent proposes updates to context guidelines based on usage patterns. View statistics with ophan context-stats.

`$3`

1. Context Building: Loads guidelines, criteria, and previous learnings 2. Agent Execution: Claude executes the task using available tools 3. Evaluation: Output is evaluated against criteria and dev tools (tests, lint, build) 4. Learning: If evaluation fails, learnings are extracted 5. Regeneration: Agent regenerates with updated understanding 6. Convergence: Repeats until evaluation passes or max iterations reached

`$3`

1. Log Analysis: Analyzes task logs from the lookback period 2. Pattern Detection: Identifies failure, iteration, and success patterns 3. Learning Consolidation: Deduplicates, promotes, and prunes learnings 4. Guideline Updates: Auto-applies updates from promoted learnings 5. Proposal Generation: Creates proposals for criteria changes (require approval) 6. Digest Generation: Writes summary report to.ophan/digests/

`$3`

- Failure Patterns: Recurring errors (TypeScript, tests, lint) - Iteration Patterns: Tasks consistently needing multiple iterations - Success Patterns: Approaches that work well consistently

`Development`

`bash

`Install dependencies`


npm install
Run tests

npm test
Type check

npm run typecheck
Build

npm run build
Run CLI locally

npm run dev -- task "your task"

Documentation

For detailed documentation, see:

- Architecture Overview — System design with Mermaid diagrams
- Inner Loop — Per-task execution engine
- Outer Loop — Pattern detection and learning consolidation
- Configuration Reference — Complete config options

Project Status

- Phase 1A: Core Infrastructure — CLI, config, types, scaffolding
- Phase 1B: Inner Loop — Task execution, Claude API, evaluation, regeneration
- Phase 1C: Outer Loop — Pattern detection, learning consolidation, proposals
- Phase 1D: Escalations — Webhook notifications
- Phase 1E: Polish — Testing, documentation
- Phase 1F: Web UI — Dashboard, config editor, log viewer

Test Coverage: 87 tests passing

License

MIT

Ophan - Self-improving dev agent

A self-improving AI development agent based on the Two-Loop Paradigm

Quick Start •
How It Works •
Documentation

---

> "Ophan" references the biblical Ophanim—"wheels within wheels" from Ezekiel's vision—representing nested loops that autonomously adapt while observing and learning continuously.

The Two-Loop Paradigm

$3

1. Generate output from Guidelines
2. Evaluate against Criteria + dev tools
3. Learn from evaluation results
4. Regenerate with improved understanding

$3

Quick Start

``bash

`Install`


npm install -g ophan
Initialize in your project

ophan init
Run a task

ophan task "fix the login validation bug"
Check status

ophan status
Run outer loop review

ophan review
View recent task logs

ophan logs
Open web UI

ophan ui


Project Structure

After running ophan init, your project will have:

`Key Concepts`

`$3`


- Workflows and decision trees
- Data structures and templates
- Constraints and failure detection
- Agent can freely update these based on learnings
$3

- Evaluation standards
- Analytical methods
- Comparative context
- Failure patterns
- Only humans can approve changes (prevents reward hacking)
$3

The outer loop requires human oversight to approve criteria changes. This prevents the agent from lowering its own standards to achieve easier "success."
Configuration

See .ophan.yaml for all options:

`yaml model: name: claude-sonnet-4-20250514 maxTokens: 4096

innerLoop: maxIterations: 5 regenerationStrategy: informed # full | informed | incremental costLimit: 0.50

outerLoop: triggers: afterTasks: 10 minOccurrences: 3 minConfidence: 0.7 lookbackDays: 30 learnings: maxCount: 50 retentionDays: 90 promotionThreshold: 3 similarityThreshold: 0.9

escalations: webhooks: - name: slack-alerts url: ${SLACK_WEBHOOK_URL} events: [escalation, digest]`

`Commands`

`$3`

ophan task--n, --dry-run— Show what would be done without executing --m, --max-iterations — Override max iterations --p, --project — Path to the project directory

ophan context-stats--d, --days — Number of days to analyze (default: 30) ---json— Output as JSON --p, --project — Path to the project directory

ophan logs--l, --limit — Number of logs to show (default: 10) --p, --project — Path to the project directory ---json — Output as JSON

ophan ui--p, --port — Port to run the server on (default: 4040) ---no-open— Do not open browser automatically ---project — Path to the project directory

`Web UI`

Ophan includes a lightweight web dashboard for viewing status and editing configuration.

`bash

`Start the UI (opens browser automatically)`


ophan ui
Start on a different port

ophan ui --port 8080
Start without opening browser

ophan ui --no-open


The UI provides:
- Dashboard: View task metrics, success rates, and costs
- Task Logs: Browse and search task execution history
- Configuration: Edit settings with form-based interface
- Guidelines/Criteria: View current guidelines and criteria files
- Digests: Read outer loop review reports
Escalations & Webhooks
Ophan can send notifications when tasks escalate (hit max iterations, exceed cost limits, etc.) or when outer loop digests are generated.
$3

-

escalation

 — Task failed to converge
-

task_complete

 — Task finished (success or failure)
-

digest

 — Outer loop review completed
$3

json
{
  "type": "escalation",
  "timestamp": "2024-01-15T10:30:00Z",
  "task": {
    "id": "task-20240115-103000-abc1",
    "description": "fix the login bug",
    "iterations": 5,
    "maxIterations": 5
  },
  "reason": "max_iterations",
  "context": {
    "lastError": "Test failed: expected 200, got 401",
    "suggestedAction": "Review task complexity or improve guidelines"
  },
  "project": {
    "name": "my-app",
    "path": "/Users/dev/my-app"
  }
}

$3

Use ${VAR_NAME} syntax for secrets:

`yaml escalations: webhooks: - name: slack url: ${SLACK_WEBHOOK_URL} headers: Authorization: Bearer ${AUTH_TOKEN} events: [escalation]`

`Prerequisites`

Ophan uses Claude Code (subscription-based) for task execution:

1. Install Claude Code CLI 2. Authenticate with your Claude subscription

| Variable | Required | Description | |----------|----------|-------------| | Webhook URLs/tokens | No | As configured in.ophan.yaml |

`How It Works`

`$3`

Ophan includes a self-improving context agent that learns which files are relevant for different tasks. After each task, it tracks:

- Hit Rate: % of provided files that were actually used (target: >70%) - Miss Rate: % of used files that weren't provided (target: <20%)

Over time, the context agent proposes updates to context guidelines based on usage patterns. View statistics with ophan context-stats.

`$3`

- Failure Patterns: Recurring errors (TypeScript, tests, lint) - Iteration Patterns: Tasks consistently needing multiple iterations - Success Patterns: Approaches that work well consistently

`Development`

`bash

`Install dependencies`


npm install
Run tests

npm test
Type check

npm run typecheck
Build

npm run build
Run CLI locally

npm run dev -- task "your task"

Documentation

For detailed documentation, see:

Project Status

Test Coverage: 87 tests passing

License

MIT