Run 4 AI models together on Apple Silicon. Get results that rival cloud AI. Pay nothing. Forever.
npm install mageagent-local# Adverant Nexus - Local Apple Silicon MageAgent
Multi-Model AI Orchestration for Apple Silicon




Run 4 specialized models together. Get results that rival cloud AI. Pay nothing.
---
### Download & Install



---
Quick Start • Why MageAgent • Patterns • Tool Execution • Contributing
---
You bought an M1/M2/M3/M4 Mac with 64GB+ unified memory. You want to run AI locally. But:
- Single models hit a ceiling - Even the best 72B model can't match multi-model orchestration
- Ollama alone isn't enough - You get inference, not intelligence
- Cloud AI costs add up - $200+/month for API calls that send your code to someone else's servers
- Tool calling is unreliable - Local models hallucinate file contents instead of reading them
MageAgent solves all of this.
---
MageAgent orchestrates 4 specialized models working together:
```
┌──────────────────────────────────────────────────────────────────┐
│ Your Request │
└─────────────────────────────┬────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ MageAgent Orchestrator │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │
│ │ Qwen-72B │ │ Qwen-32B │ │ Qwen-7B │ │ Hermes-3│ │
│ │ Q8_0 │ │ Q4_K_M │ │ Q4_K_M │ │ Q8_0 │ │
│ │ │ │ │ │ │ │ │ │
│ │ Reasoning │ │ Coding │ │ Validate │ │ Tools │ │
│ │ Planning │ │ Compete │ │ Judge │ │ ReAct │ │
│ │ Analysis │ │ Generate │ │ Fast │ │ Files │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘ │
│ 77GB 18GB 5GB 9GB │
└──────────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ Better Response │
│ Multiple perspectives. Validated. Tool-grounded. │
└──────────────────────────────────────────────────────────────────┘
The key insight: Different models excel at different tasks. Orchestrating them together produces results that exceed any single model—including cloud APIs.
---
`bash`
git clone https://github.com/adverant/nexus-local-mageagent.git
cd nexus-local-mageagent
./scripts/install.sh
That's it. The installer:
1. Sets up the Python environment with MLX
2. Installs the native menu bar app
3. Configures auto-start on login
4. Downloads models (optional, ~109GB)
5. Starts the server
Or with npm:
`bash`
npm install -g mageagent-local && mageagent setup
---
| Capability | Ollama | MageAgent |
|------------|--------|-----------|
| Single model inference | Yes | Yes |
| Multi-model orchestration | No | Yes |
| Model competition + judging | No | Yes |
| Generate + validate loops | No | Yes |
| Real tool execution | No | Yes |
| Native menu bar app | No | Yes |
| Claude Code integration | No | Yes |
| Factor | Cloud API | MageAgent |
|--------|-----------|-----------|
| Cost per query | $0.01-0.10 | $0 |
| Monthly cost (heavy use) | $200+ | $0 |
| Your code leaves your machine | Yes | No |
| Rate limits | Yes | No |
| Works offline | No | Yes |
| Latency | Network dependent | Local speed |
| Task Type | Single 72B Model | MageAgent Pattern | Improvement |
|-----------|------------------|-------------------|-------------|
| Complex reasoning | Baseline | hybrid (72B + tools) | +5% |validated
| Code generation | Baseline | (72B + 7B check) | +5-10% |compete
| Security-critical code | Baseline | (72B vs 32B + judge) | +10-15% |execute
| Tool-grounded tasks | Often hallucinates | (ReAct loop) | 100% accurate |
Based on internal testing across 500+ prompts. Your results may vary based on task type.
---
Choose the right pattern for your task:
The default pattern. Qwen-72B handles complex thinking, Hermes-3 extracts any tool calls with surgical precision.
`bash`
curl -X POST http://localhost:3457/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mageagent:hybrid", "messages": [{"role": "user", "content": "Explain the architecture of this codebase and suggest improvements"}]}'
Never ship broken code. The 7B model catches errors, the 72B fixes them before you see the output.
Two models solve the problem independently. A third picks the best solution. Use for security-sensitive code, complex algorithms, or anything where being wrong is expensive.
Not simulated. When MageAgent needs to read a file, it reads the file. When it needs to run a command, it runs the command.
`
You: "Read my .zshrc and tell me what shell plugins I have"
MageAgent:
1. Qwen-72B decides to read the file
2. Hermes-3 extracts: {"tool": "Read", "path": "~/.zshrc"}
3. Tool executor actually reads ~/.zshrc
4. Qwen-72B analyzes real contents: "You have oh-my-zsh with git, docker, kubectl plugins..."
`
Don't want to think about patterns? Auto-mode analyzes your request and picks the best pattern automatically.
---
The execute pattern is the breakthrough feature of v2.0.
Most local AI setups: Model generates text that looks like it read a file. It didn't.
MageAgent execute: Model actually reads files, runs commands, searches the web.
| Tool | What It Does |
|------|--------------|
| Read | Read actual file contents |Write
| | Write to files |Bash
| | Execute shell commands |Glob
| | Find files by pattern |Grep
| | Search file contents |WebSearch
| | Search the web (DuckDuckGo) |
- Dangerous commands are blocked (rm -rf /, etc.)
- 30-second timeout on all commands
- File size limits (50KB) prevent memory issues
- All execution is sandboxed to your user permissions
---
Control everything from your Mac menu bar:

Real-time system resource monitoring with color-coded indicators:
- Memory: Shows used/total GB and percentage (green/yellow/red based on pressure)
- CPU: Shows current usage percentage with pressure indicator
- GPU/Metal: Shows Metal status (Idle/Standby/Active with loaded model count)
Pressure thresholds:
- Green (Normal): < 75% memory, < 70% CPU
- Yellow (Warning): 75-90% memory, 70-90% CPU
- Red (Critical): > 90% memory or CPU
The app is native Swift/Cocoa—no Electron bloat.
---
MageAgent integrates directly with Claude Code CLI and VSCode extension.
`bash`
/mage hybrid # Switch to hybrid pattern
/mage execute # Switch to execute pattern
/mage compete # Switch to compete pattern
/mageagent status # Check server health
/warmup all # Preload all models into memory
Just say what you want:
- "use mage for this"
- "use best local model"
- "mage this code"
- "use local AI for security review"
MageAgent hooks into the Claude Code VSCode extension:
- Automatic model routing based on task
- Pre-tool and post-response hooks
- Custom instructions per pattern
---
Tested on M4 Max with 128GB unified memory:
| Model | Tokens/sec | Memory |
|-------|------------|--------|
| Hermes-3 Q8 | ~50 tok/s | 9GB |
| Qwen-7B Q4 | ~105 tok/s | 5GB |
| Qwen-32B Q4 | ~25 tok/s | 18GB |
| Qwen-72B Q8 | ~8 tok/s | 77GB |
| Pattern | Typical Response Time | Models Loaded |
|---------|----------------------|---------------|
| hybrid | 15-30s | 72B + 8B |validated
| | 20-45s | 72B + 7B |compete
| | 45-90s | 72B + 32B + 7B |execute
| | 30-60s | 72B + 8B |
---
| Requirement | Minimum | Recommended |
|-------------|---------|-------------|
| macOS | 13.0 (Ventura) | 14.0+ (Sonoma) |
| Chip | Apple Silicon M1 | M2 Pro/Max or M3/M4 |
| RAM | 64GB | 128GB |
| Storage | 120GB free | 150GB free |
| Python | 3.9+ | 3.11+ |
| Pattern | Minimum RAM | Why |
|---------|-------------|-----|
| auto | 8GB | Only loads 7B router |tools
| | 12GB | Hermes-3 only |hybrid
| | 90GB | 72B + 8B |validated
| | 85GB | 72B + 7B |compete
| | 105GB | 72B + 32B + 7B |
---
MageAgent is built on three key technologies:
achieves 100% accurate tool usage.---
API Reference
MageAgent exposes an OpenAI-compatible API on
localhost:3457.$3
`bash
curl http://localhost:3457/health
`$3
`bash
curl http://localhost:3457/v1/models
`$3
`bash
curl -X POST http://localhost:3457/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mageagent:hybrid",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 2048,
"temperature": 0.7
}'
`$3
`bash
curl -X POST http://localhost:3457/models/load \
-H "Content-Type: application/json" \
-d '{"model": "primary"}'curl -X POST http://localhost:3457/models/unload \
-H "Content-Type: application/json" \
-d '{"model": "primary"}'
`---
Documentation
| Doc | Description |
|-----|-------------|
| Quick Start | Get running in 5 minutes |
| Orchestration Patterns | Deep dive on each pattern |
| Menu Bar App | Using the native app |
| Claude Code Setup | VSCode integration |
| Auto-Start | LaunchAgent configuration |
| Troubleshooting | Common issues and fixes |
| Contributing | How to contribute |
---
Roadmap
$3
- [x] Multi-model orchestration (hybrid, validated, compete)
- [x] Real tool execution with ReAct loop
- [x] Native macOS menu bar app
- [x] Claude Code integration (hooks, commands)
- [x] One-command installation
- [x] OpenAI-compatible API$3
- [ ] MCP (Model Context Protocol) server
- [ ] Web UI dashboard
- [ ] Ollama backend option$3
- [ ] Custom pattern builder
- [ ] Distributed model loading (multi-Mac)
- [ ] Fine-tuning integration
- [ ] Prompt caching---
Contributing
MageAgent is open source. We welcome contributions.
Ways to contribute:
- Report bugs and issues
- Suggest new orchestration patterns
- Improve documentation
- Submit code improvements
- Test on different Mac configurations
See CONTRIBUTING.md for guidelines.
---
FAQ
Q: Why not just use Ollama?
A: Ollama is great for single-model inference. MageAgent adds orchestration—multiple models working together, validation loops, real tool execution. It's the difference between a calculator and a spreadsheet.
Q: How much does it cost?
A: $0. Forever. MageAgent is MIT licensed. The models are open weights. Your Mac's electricity is the only cost.
Q: Will it work on my Mac?
A: If you have Apple Silicon (M1/M2/M3/M4) and 64GB+ RAM, yes. The more RAM, the more patterns you can run simultaneously.
Q: Is my data private?
A: 100%. Everything runs locally. Your code never leaves your machine. No telemetry, no analytics, no phone-home.
Q: How does it compare to Claude/GPT-4?
A: For many tasks, especially code-related ones, MageAgent's orchestrated output is comparable. The
compete` pattern often exceeds single-model cloud responses. But cloud models still win on some tasks—this is a tool, not a replacement.---
We believe in transparency. Here's how MageAgent actually compares:
| Aspect | MageAgent Local | Claude Sonnet 4.5 | Claude Opus 4.5 |
|--------|-----------------|-------------------|-----------------|
| Response Quality | 60-70% | 85-90% | 95-100% |
| Tool Calling Reliability | ~70% | ~95% | ~98% |
| Speed (simple task) | 1-5s (validator) | 2-4s | 3-6s |
| Speed (complex task) | 30-120s (72B) | 5-15s | 8-20s |
| Cost | Free | ~$0.01-0.10/task | ~$0.05-0.50/task |
| Privacy | 100% local | Cloud | Cloud |
Bottom line: MageAgent is a solid free/private option for coding tasks and quick iterations. For critical work or complex reasoning, cloud AI may still be the better choice.
---
MageAgent builds on the work of:
- MLX — Apple's ML framework that makes this possible
- Qwen — The base models from Alibaba
- NousResearch — Hermes-3 model for tool calling
- Together AI — Mixture of Agents research
- The local AI community — r/LocalLLaMA, MLX Discord, and everyone pushing the boundaries
---
MIT License. See LICENSE.
---
Built by Adverant
Local AI for developers who ship
Star this repo if MageAgent helps you