mageagent-local - npm explorer

# Adverant Nexus - Local Apple Silicon MageAgent

Multi-Model AI Orchestration for Apple Silicon

![License: MIT](https://opensource.org/licenses/MIT)
![Apple Silicon](https://www.apple.com/mac/)
![MLX](https://github.com/ml-explore/mlx)
![Version](https://github.com/adverant/nexus-local-mageagent/releases)

Run 4 specialized models together. Get results that rival cloud AI. Pay nothing.

---

### Download & Install

![Download DMG](https://github.com/adverant/nexus-local-mageagent/releases/latest/download/MageAgent-2.1.0.dmg)
![npm](https://www.npmjs.com/package/mageagent-local)
![Git Clone](https://github.com/adverant/nexus-local-mageagent)

---

Quick Start • Why MageAgent • Patterns • Tool Execution • Contributing

---

The Problem

You bought an M1/M2/M3/M4 Mac with 64GB+ unified memory. You want to run AI locally. But:

- Single models hit a ceiling - Even the best 72B model can't match multi-model orchestration
- Ollama alone isn't enough - You get inference, not intelligence
- Cloud AI costs add up - $200+/month for API calls that send your code to someone else's servers
- Tool calling is unreliable - Local models hallucinate file contents instead of reading them

MageAgent solves all of this.

---

The Solution

MageAgent orchestrates 4 specialized models working together:

``┌──────────────────────────────────────────────────────────────────┐ │ Your Request │ └─────────────────────────────┬────────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ MageAgent Orchestrator │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │ │ │ Qwen-72B │ │ Qwen-32B │ │ Qwen-7B │ │ Hermes-3│ │ │ │ Q8_0 │ │ Q4_K_M │ │ Q4_K_M │ │ Q8_0 │ │ │ │ │ │ │ │ │ │ │ │ │ │ Reasoning │ │ Coding │ │ Validate │ │ Tools │ │ │ │ Planning │ │ Compete │ │ Judge │ │ ReAct │ │ │ │ Analysis │ │ Generate │ │ Fast │ │ Files │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘ │ │ 77GB 18GB 5GB 9GB │ └──────────────────────────────────────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ Better Response │ │ Multiple perspectives. Validated. Tool-grounded. │ └──────────────────────────────────────────────────────────────────┘`

The key insight: Different models excel at different tasks. Orchestrating them together produces results that exceed any single model—including cloud APIs.

---

`30-Second Install`

`bash git clone https://github.com/adverant/nexus-local-mageagent.git cd nexus-local-mageagent ./scripts/install.sh`

That's it. The installer: 1. Sets up the Python environment with MLX 2. Installs the native menu bar app 3. Configures auto-start on login 4. Downloads models (optional, ~109GB) 5. Starts the server

Or with npm:`bash npm install -g mageagent-local && mageagent setup`

---

`Why MageAgent`

`$3`

| Capability | Ollama | MageAgent | |------------|--------|-----------| | Single model inference | Yes | Yes | | Multi-model orchestration | No | Yes | | Model competition + judging | No | Yes | | Generate + validate loops | No | Yes | | Real tool execution | No | Yes | | Native menu bar app | No | Yes | | Claude Code integration | No | Yes |

`$3`

| Factor | Cloud API | MageAgent | |--------|-----------|-----------| | Cost per query | $0.01-0.10 | $0 | | Monthly cost (heavy use) | $200+ | $0 | | Your code leaves your machine | Yes | No | | Rate limits | Yes | No | | Works offline | No | Yes | | Latency | Network dependent | Local speed |

`$3`

| Task Type | Single 72B Model | MageAgent Pattern | Improvement | |-----------|------------------|-------------------|-------------| | Complex reasoning | Baseline |hybrid(72B + tools) | +5% | | Code generation | Baseline |validated(72B + 7B check) | +5-10% | | Security-critical code | Baseline |compete(72B vs 32B + judge) | +10-15% | | Tool-grounded tasks | Often hallucinates |execute (ReAct loop) | 100% accurate |

Based on internal testing across 500+ prompts. Your results may vary based on task type.

---

`Orchestration Patterns`

Choose the right pattern for your task:

`$3`


72B reasoning + Hermes-3 tool extraction
The default pattern. Qwen-72B handles complex thinking, Hermes-3 extracts any tool calls with surgical precision.

`bash curl -X POST http://localhost:3457/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "mageagent:hybrid", "messages": [{"role": "user", "content": "Explain the architecture of this codebase and suggest improvements"}]}'`

`$3`


72B generates + 7B validates + 72B revises
Never ship broken code. The 7B model catches errors, the 72B fixes them before you see the output.
$3

72B and 32B compete + 7B judges the winner
Two models solve the problem independently. A third picks the best solution. Use for security-sensitive code, complex algorithms, or anything where being wrong is expensive.
$3

ReAct loop with actual file/web/command access
Not simulated. When MageAgent needs to read a file, it reads the file. When it needs to run a command, it runs the command.

`You: "Read my .zshrc and tell me what shell plugins I have"

MageAgent: 1. Qwen-72B decides to read the file 2. Hermes-3 extracts: {"tool": "Read", "path": "~/.zshrc"} 3. Tool executor actually reads ~/.zshrc 4. Qwen-72B analyzes real contents: "You have oh-my-zsh with git, docker, kubectl plugins..."`

`$3`


Intelligent routing based on task analysis
Don't want to think about patterns? Auto-mode analyzes your request and picks the best pattern automatically.
---
Real Tool Execution

The execute pattern is the breakthrough feature of v2.0.

Most local AI setups: Model generates text that looks like it read a file. It didn't.

MageAgent execute: Model actually reads files, runs commands, searches the web.

`$3`

| Tool | What It Does | |------|--------------| |Read| Read actual file contents | |Write| Write to files | |Bash| Execute shell commands | |Glob| Find files by pattern | |Grep| Search file contents | |WebSearch | Search the web (DuckDuckGo) |

`$3`

- Dangerous commands are blocked (rm -rf /, etc.) - 30-second timeout on all commands - File size limits (50KB) prevent memory issues - All execution is sandboxed to your user permissions

---

`Menu Bar App`

Control everything from your Mac menu bar:

`$3`

Real-time system resource monitoring with color-coded indicators:

- Memory: Shows used/total GB and percentage (green/yellow/red based on pressure) - CPU: Shows current usage percentage with pressure indicator - GPU/Metal: Shows Metal status (Idle/Standby/Active with loaded model count)

Pressure thresholds: - Green (Normal): < 75% memory, < 70% CPU - Yellow (Warning): 75-90% memory, 70-90% CPU - Red (Critical): > 90% memory or CPU

`$3`


- Start/Stop/Restart the server with one click
- Load models individually or all at once
- Switch patterns with automatic model loading
- Run tests with streaming colored output
- View logs and debug issues
- See status at a glance (server health, loaded models)
The app is native Swift/Cocoa—no Electron bloat.
---
Claude Code Integration
MageAgent integrates directly with Claude Code CLI and VSCode extension.
$3

`bash /mage hybrid # Switch to hybrid pattern /mage execute # Switch to execute pattern /mage compete # Switch to compete pattern /mageagent status # Check server health /warmup all # Preload all models into memory`

`$3`

Just say what you want: - "use mage for this" - "use best local model" - "mage this code" - "use local AI for security review"

`$3`

MageAgent hooks into the Claude Code VSCode extension: - Automatic model routing based on task - Pre-tool and post-response hooks - Custom instructions per pattern

---

`Performance`

Tested on M4 Max with 128GB unified memory:

| Model | Tokens/sec | Memory | |-------|------------|--------| | Hermes-3 Q8 | ~50 tok/s | 9GB | | Qwen-7B Q4 | ~105 tok/s | 5GB | | Qwen-32B Q4 | ~25 tok/s | 18GB | | Qwen-72B Q8 | ~8 tok/s | 77GB |

| Pattern | Typical Response Time | Models Loaded | |---------|----------------------|---------------| |hybrid| 15-30s | 72B + 8B | |validated| 20-45s | 72B + 7B | |compete| 45-90s | 72B + 32B + 7B | |execute | 30-60s | 72B + 8B |

---

`Requirements`

| Requirement | Minimum | Recommended | |-------------|---------|-------------| | macOS | 13.0 (Ventura) | 14.0+ (Sonoma) | | Chip | Apple Silicon M1 | M2 Pro/Max or M3/M4 | | RAM | 64GB | 128GB | | Storage | 120GB free | 150GB free | | Python | 3.9+ | 3.11+ |

`$3`

| Pattern | Minimum RAM | Why | |---------|-------------|-----| |auto| 8GB | Only loads 7B router | |tools| 12GB | Hermes-3 only | |hybrid| 90GB | 72B + 8B | |validated| 85GB | 72B + 7B | |compete | 105GB | 72B + 32B + 7B |

---

`How It Works`

MageAgent is built on three key technologies:

`$3`


Apple's machine learning framework, optimized for Apple Silicon. Models run on unified memory with near-zero overhead.
$3

Research from Together AI shows that combining multiple LLM outputs produces better results than any single model. MageAgent implements this with local models.
$3

Reasoning + Acting. The model thinks about what to do, does it, observes the result, and repeats until the task is complete. This is how

execute

 achieves 100% accurate tool usage.
---
API Reference

MageAgent exposes an OpenAI-compatible API on localhost:3457.

`$3`

bash
curl http://localhost:3457/health

$3

bash
curl http://localhost:3457/v1/models

$3

bash
curl -X POST http://localhost:3457/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mageagent:hybrid",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 2048,
    "temperature": 0.7
  }'

$3

bash
curl -X POST http://localhost:3457/models/load \
  -H "Content-Type: application/json" \
  -d '{"model": "primary"}'

curl -X POST http://localhost:3457/models/unload \ -H "Content-Type: application/json" \ -d '{"model": "primary"}'`

---

`Documentation`

| Doc | Description | |-----|-------------| | Quick Start | Get running in 5 minutes | | Orchestration Patterns | Deep dive on each pattern | | Menu Bar App | Using the native app | | Claude Code Setup | VSCode integration | | Auto-Start | LaunchAgent configuration | | Troubleshooting | Common issues and fixes | | Contributing | How to contribute |

---

`Roadmap`

`$3`


- [x] Multi-model orchestration (hybrid, validated, compete)
- [x] Real tool execution with ReAct loop
- [x] Native macOS menu bar app
- [x] Claude Code integration (hooks, commands)
- [x] One-command installation
- [x] OpenAI-compatible API
$3

- [ ] MCP (Model Context Protocol) server
- [ ] Web UI dashboard
- [ ] Ollama backend option
$3

- [ ] Custom pattern builder
- [ ] Distributed model loading (multi-Mac)
- [ ] Fine-tuning integration
- [ ] Prompt caching
---
Contributing
MageAgent is open source. We welcome contributions.
Ways to contribute:
- Report bugs and issues
- Suggest new orchestration patterns
- Improve documentation
- Submit code improvements
- Test on different Mac configurations
See CONTRIBUTING.md for guidelines.
---
FAQ
Q: Why not just use Ollama?
A: Ollama is great for single-model inference. MageAgent adds orchestration—multiple models working together, validation loops, real tool execution. It's the difference between a calculator and a spreadsheet.
Q: How much does it cost?
A: $0. Forever. MageAgent is MIT licensed. The models are open weights. Your Mac's electricity is the only cost.
Q: Will it work on my Mac?
A: If you have Apple Silicon (M1/M2/M3/M4) and 64GB+ RAM, yes. The more RAM, the more patterns you can run simultaneously.
Q: Is my data private?
A: 100%. Everything runs locally. Your code never leaves your machine. No telemetry, no analytics, no phone-home.

Q: How does it compare to Claude/GPT-4? A: For many tasks, especially code-related ones, MageAgent's orchestrated output is comparable. Thecompete` pattern often exceeds single-model cloud responses. But cloud models still win on some tasks—this is a tool, not a replacement.

---

Honest Comparison: MageAgent vs Cloud AI

We believe in transparency. Here's how MageAgent actually compares:

| Aspect | MageAgent Local | Claude Sonnet 4.5 | Claude Opus 4.5 |
|--------|-----------------|-------------------|-----------------|
| Response Quality | 60-70% | 85-90% | 95-100% |
| Tool Calling Reliability | ~70% | ~95% | ~98% |
| Speed (simple task) | 1-5s (validator) | 2-4s | 3-6s |
| Speed (complex task) | 30-120s (72B) | 5-15s | 8-20s |
| Cost | Free | ~$0.01-0.10/task | ~$0.05-0.50/task |
| Privacy | 100% local | Cloud | Cloud |

$3

- Privacy matters (sensitive code)
- Cost matters (high volume, simple tasks)
- Fast iteration on simple questions
- Offline work

$3

- Complex architecture decisions
- Multi-file refactoring
- Nuanced requirements
- Maximum quality matters more than cost

Bottom line: MageAgent is a solid free/private option for coding tasks and quick iterations. For critical work or complex reasoning, cloud AI may still be the better choice.

---

Acknowledgments

MageAgent builds on the work of:

- MLX — Apple's ML framework that makes this possible
- Qwen — The base models from Alibaba
- NousResearch — Hermes-3 model for tool calling
- Together AI — Mixture of Agents research
- The local AI community — r/LocalLLaMA, MLX Discord, and everyone pushing the boundaries

---

License

MIT License. See LICENSE.

---

Built by Adverant

Local AI for developers who ship

Star this repo if MageAgent helps you

# Adverant Nexus - Local Apple Silicon MageAgent

Multi-Model AI Orchestration for Apple Silicon

Run 4 specialized models together. Get results that rival cloud AI. Pay nothing.

---

### Download & Install

---

Quick Start • Why MageAgent • Patterns • Tool Execution • Contributing

---

The Problem

You bought an M1/M2/M3/M4 Mac with 64GB+ unified memory. You want to run AI locally. But:

MageAgent solves all of this.

---

The Solution

MageAgent orchestrates 4 specialized models working together:

The key insight: Different models excel at different tasks. Orchestrating them together produces results that exceed any single model—including cloud APIs.

---

`30-Second Install`

`bash git clone https://github.com/adverant/nexus-local-mageagent.git cd nexus-local-mageagent ./scripts/install.sh`

Or with npm:`bash npm install -g mageagent-local && mageagent setup`

---

`Why MageAgent`

`$3`

Based on internal testing across 500+ prompts. Your results may vary based on task type.

---

`Orchestration Patterns`

Choose the right pattern for your task:

`$3`


72B reasoning + Hermes-3 tool extraction
The default pattern. Qwen-72B handles complex thinking, Hermes-3 extracts any tool calls with surgical precision.

`$3`


72B generates + 7B validates + 72B revises
Never ship broken code. The 7B model catches errors, the 72B fixes them before you see the output.
$3

72B and 32B compete + 7B judges the winner
Two models solve the problem independently. A third picks the best solution. Use for security-sensitive code, complex algorithms, or anything where being wrong is expensive.
$3

ReAct loop with actual file/web/command access
Not simulated. When MageAgent needs to read a file, it reads the file. When it needs to run a command, it runs the command.

`You: "Read my .zshrc and tell me what shell plugins I have"

`$3`


Intelligent routing based on task analysis
Don't want to think about patterns? Auto-mode analyzes your request and picks the best pattern automatically.
---
Real Tool Execution

The execute pattern is the breakthrough feature of v2.0.

Most local AI setups: Model generates text that looks like it read a file. It didn't.

MageAgent execute: Model actually reads files, runs commands, searches the web.

`$3`

- Dangerous commands are blocked (rm -rf /, etc.) - 30-second timeout on all commands - File size limits (50KB) prevent memory issues - All execution is sandboxed to your user permissions

---

`Menu Bar App`

Control everything from your Mac menu bar:

`$3`

Real-time system resource monitoring with color-coded indicators:

Pressure thresholds: - Green (Normal): < 75% memory, < 70% CPU - Yellow (Warning): 75-90% memory, 70-90% CPU - Red (Critical): > 90% memory or CPU

`$3`


- Start/Stop/Restart the server with one click
- Load models individually or all at once
- Switch patterns with automatic model loading
- Run tests with streaming colored output
- View logs and debug issues
- See status at a glance (server health, loaded models)
The app is native Swift/Cocoa—no Electron bloat.
---
Claude Code Integration
MageAgent integrates directly with Claude Code CLI and VSCode extension.
$3

`$3`

Just say what you want: - "use mage for this" - "use best local model" - "mage this code" - "use local AI for security review"

`$3`

MageAgent hooks into the Claude Code VSCode extension: - Automatic model routing based on task - Pre-tool and post-response hooks - Custom instructions per pattern

---

`Performance`

Tested on M4 Max with 128GB unified memory:

---

`Requirements`

`$3`

---

`How It Works`

MageAgent is built on three key technologies:

`$3`


Apple's machine learning framework, optimized for Apple Silicon. Models run on unified memory with near-zero overhead.
$3

Research from Together AI shows that combining multiple LLM outputs produces better results than any single model. MageAgent implements this with local models.
$3

Reasoning + Acting. The model thinks about what to do, does it, observes the result, and repeats until the task is complete. This is how

execute

 achieves 100% accurate tool usage.
---
API Reference

MageAgent exposes an OpenAI-compatible API on localhost:3457.

`$3`

bash
curl http://localhost:3457/health

$3

bash
curl http://localhost:3457/v1/models

$3

bash
curl -X POST http://localhost:3457/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mageagent:hybrid",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 2048,
    "temperature": 0.7
  }'

$3

bash
curl -X POST http://localhost:3457/models/load \
  -H "Content-Type: application/json" \
  -d '{"model": "primary"}'

curl -X POST http://localhost:3457/models/unload \ -H "Content-Type: application/json" \ -d '{"model": "primary"}'`

---

`Documentation`

---

`Roadmap`

`$3`


- [x] Multi-model orchestration (hybrid, validated, compete)
- [x] Real tool execution with ReAct loop
- [x] Native macOS menu bar app
- [x] Claude Code integration (hooks, commands)
- [x] One-command installation
- [x] OpenAI-compatible API
$3

- [ ] MCP (Model Context Protocol) server
- [ ] Web UI dashboard
- [ ] Ollama backend option
$3

- [ ] Custom pattern builder
- [ ] Distributed model loading (multi-Mac)
- [ ] Fine-tuning integration
- [ ] Prompt caching
---
Contributing
MageAgent is open source. We welcome contributions.
Ways to contribute:
- Report bugs and issues
- Suggest new orchestration patterns
- Improve documentation
- Submit code improvements
- Test on different Mac configurations
See CONTRIBUTING.md for guidelines.
---
FAQ
Q: Why not just use Ollama?
A: Ollama is great for single-model inference. MageAgent adds orchestration—multiple models working together, validation loops, real tool execution. It's the difference between a calculator and a spreadsheet.
Q: How much does it cost?
A: $0. Forever. MageAgent is MIT licensed. The models are open weights. Your Mac's electricity is the only cost.
Q: Will it work on my Mac?
A: If you have Apple Silicon (M1/M2/M3/M4) and 64GB+ RAM, yes. The more RAM, the more patterns you can run simultaneously.
Q: Is my data private?
A: 100%. Everything runs locally. Your code never leaves your machine. No telemetry, no analytics, no phone-home.

---

Honest Comparison: MageAgent vs Cloud AI

We believe in transparency. Here's how MageAgent actually compares:

$3

- Privacy matters (sensitive code)
- Cost matters (high volume, simple tasks)
- Fast iteration on simple questions
- Offline work

$3

- Complex architecture decisions
- Multi-file refactoring
- Nuanced requirements
- Maximum quality matters more than cost

Bottom line: MageAgent is a solid free/private option for coding tasks and quick iterations. For critical work or complex reasoning, cloud AI may still be the better choice.

---

Acknowledgments

MageAgent builds on the work of:

---

License

MIT License. See LICENSE.

---

Built by Adverant

Local AI for developers who ship

Star this repo if MageAgent helps you