Claude Code with local Ollama LLMs - Zero API costs, no rate limits, complete privacy
npm install loclaudeRead the docs
Claude Code with Local LLMs
Stop burning through Claude API usage limits. Run Claude Code's powerful agentic workflow with local Ollama models on your own hardware.
> Requires ollama v0.14.2 or higher
Zero API costs. No rate limits. Complete privacy.


---
- No Rate Limits: Use Claude Code as much as you want
- Privacy: Your code never leaves your machine
- Cost Control: Use your own hardware, pay for electricity not tokens
- Offline Capable: Work without internet (after model download)
- GPU or CPU: Works with NVIDIA GPUs or CPU-only systems
loclaude provides:
- One-command setup for Ollama + Open WebUI containers
- Smart model management with auto-loading
- GPU auto-detection with CPU fallback
- Project scaffolding with Docker configs
``bashWith npm (requires Node.js 18+)
npm install -g loclaude
$3
| Solution | Cost | Speed | Privacy | Limits |
|----------|------|-------|---------|--------|
| loclaude | Free after setup | Fast (GPU) | 100% local | None |
| Claude API/Web | $20-200+/month | Fast | Cloud-based | Rate limited |
| GitHub Copilot | $10-20/month | Fast | Cloud-based | Context limited |
| Cursor/Codeium | $20+/month | Fast | Cloud-based | Usage limits |
loclaude gives you the utility of Ollama with the convenience of a managed solution for claude code integration.
Quick Start (5 Minutes)
`bash
1. Install loclaude
npm install -g loclaude2. Install Claude Code (if you haven't already)
npm install -g @anthropic-ai/claude-code3. Setup your project (auto-detects GPU)
loclaude init4. Start Ollama container
loclaude docker-up5. Pull a model (choose based on your hardware)
loclaude models-pull qwen3-coder:30b # GPU with 16GB+ VRAM
OR
loclaude models-pull qwen2.5-coder:7b # CPU or limited VRAM6. Run Claude Code with unlimited local LLM
loclaude run
`That's it! You now have unlimited Claude Code sessions with local models.
Prerequisites
Required:
- Docker with Docker Compose v2
- Claude Code CLI (
npm install -g @anthropic-ai/claude-code)Optional (for GPU acceleration):
- NVIDIA GPU with 16GB+ VRAM (RTX 3090, 4090, A5000, etc.)
- NVIDIA Container Toolkit
CPU-only systems work fine! Use
--no-gpu flag during init and smaller models.Check your setup:
`bash
loclaude doctor
`Features
$3
When you run
loclaude run, it automatically:1. Checks if your selected model is loaded in Ollama
2. If not loaded, warms up the model with a 10-minute keep-alive (Configurable through env vars)
3. Shows
[loaded] indicator in model selection for running models$3
loclaude init automatically detects NVIDIA GPUs and configures the appropriate Docker setup:- GPU detected: Uses
runtime: nvidia and CUDA-enabled images
- No GPU: Uses CPU-only configuration with smaller default modelsCommands
$3
`bash
loclaude run # Interactive model selection
loclaude run -m qwen3-coder:30b # Use specific model
loclaude run -- --help # Pass args to claude
`$3
`bash
loclaude init # Auto-detect GPU, scaffold project
loclaude init --gpu # Force GPU mode
loclaude init --no-gpu # Force CPU-only mode
loclaude init --force # Overwrite existing files
loclaude init --no-webui # Skip Open WebUI in compose file
`$3
`bash
loclaude docker-up # Start containers (detached)
loclaude docker-up --no-detach # Start in foreground
loclaude docker-down # Stop containers
loclaude docker-status # Show container status
loclaude docker-logs # Show logs
loclaude docker-logs --follow # Follow logs
loclaude docker-restart # Restart containers
`$3
`bash
loclaude models # List installed models
loclaude models-pull # Pull a model
loclaude models-rm # Remove a model
loclaude models-show # Show model details
loclaude models-run # Run model interactively (ollama CLI)
`$3
`bash
loclaude doctor # Check prerequisites
loclaude config # Show current configuration
loclaude config-paths # Show config file search paths
`Recommended Models
$3
| Model | Size | Speed | Quality | Best For |
|-------|------|-------|---------|----------|
|
qwen3-coder:30b | ~17 GB | ~50-100 tok/s | Excellent | Most coding tasks, refactoring, debugging |
| deepseek-coder:33b | ~18 GB | ~40-80 tok/s | Excellent | Code understanding, complex logic |Recommendation: Start with
qwen3-coder:30b for the best balance of speed and quality.$3
| Model | Size | Speed | Quality | Best For |
|-------|------|-------|---------|----------|
|
qwen2.5-coder:7b | ~4 GB | ~10-20 tok/s | Good | Code completion, simple refactoring |
| deepseek-coder:6.7b | ~4 GB | ~10-20 tok/s | Good | Understanding existing code |
| llama3.2:3b | ~2 GB | ~15-30 tok/s | Fair | Quick edits, file operations |Configuration
loclaude supports configuration via files and environment variables.
$3
Config files are loaded in priority order:
1.
./.loclaude/config.json (project-local)
2. ~/.config/loclaude/config.json (user global)Example config:
`json
{
"ollama": {
"url": "http://localhost:11434",
"defaultModel": "qwen3-coder:30b"
},
"docker": {
"composeFile": "./docker-compose.yml",
"gpu": true
},
"claude": {
"extraArgs": ["--verbose"]
}
}
`$3
| Variable | Description | Default |
|----------|-------------|---------|
|
OLLAMA_URL | Ollama API endpoint | http://localhost:11434 |
| OLLAMA_MODEL | Default model name | qwen3-coder:30b |
| LOCLAUDE_COMPOSE_FILE | Path to docker-compose.yml | ./docker-compose.yml |
| LOCLAUDE_GPU | Enable GPU (true/false) | true |$3
Configuration is merged in this order (highest priority first):
1. CLI arguments
2. Environment variables
3. Project config (
./.loclaude/config.json)
4. User config (~/.config/loclaude/config.json)
5. Default valuesService URLs
When containers are running:
| Service | URL | Description |
|---------|-----|-------------|
| Ollama API | | LLM inference API |
| Open WebUI | | Chat interface |
Project Structure
After running
loclaude init:`
.
├── .claude/
│ └── CLAUDE.md # Claude Code instructions
├── .loclaude/
│ └── config.json # Loclaude configuration
├── models/ # Ollama model storage (gitignored)
├── docker-compose.yml # Container definitions (GPU or CPU mode)
├── mise.toml # Task runner configuration
└── README.md
`Using with mise
The
init command creates a mise.toml with convenient task aliases:`bash
mise run up # loclaude docker-up
mise run down # loclaude docker-down
mise run claude # loclaude run
mise run pull # loclaude models-pull
mise run doctor # loclaude doctor
`FAQ
$3
Yes! Once you have models downloaded, you can run as many sessions as you want with zero additional cost.
$3
30B parameter models (qwen3-coder:30b) are comparable to GPT-3.5 and work okay for most coding tasks. Larger models have a bit more success. Claude API is still better, but this allows for continuing work when you have hit that pesky usage limit.
$3
No, but highly recommended. CPU-only mode works with smaller models at ~10-20 tokens/sec. A GPU (16GB+ VRAM) gives you 50-100 tokens/sec with larger, better models.
$3
Absolutely! Keep using Claude API for critical tasks, use loclaude for everything else to save money and avoid limits.
Troubleshooting
$3
`bash
loclaude doctor
`This verifies:
- Docker and Docker Compose installation
- NVIDIA GPU detection (optional)
- NVIDIA Container Toolkit (optional)
- Claude Code CLI
- Ollama API connectivity
$3
`bash
View logs
loclaude docker-logs --followRestart containers
loclaude docker-restartFull reset
loclaude docker-down && loclaude docker-up
`$3
If Claude Code can't connect to Ollama:
1. Verify Ollama is running:
loclaude docker-status
2. Check the API: curl http://localhost:11434/api/tags
3. Verify your config: loclaude config$3
If you have a GPU but it's not detected:
1. Check NVIDIA drivers:
nvidia-smi
2. Test Docker GPU access: docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
3. Install NVIDIA Container Toolkit if missing
4. Re-run loclaude init --gpu to force GPU mode$3
If inference is slow on CPU:
1. Use smaller, quantized models:
qwen2.5-coder:7b, llama3.2:3b
2. Expect ~10-20 tokens/sec on modern CPUs
3. Consider cloud models via Ollama: glm-4.7:cloudGetting Help
- Issues/Bugs: GitHub Issues
- Questions: GitHub Discussions
- Documentation: Run
loclaude --help or check this README
- System Check: Run loclaude doctor to diagnose problemsDevelopment
$3
`bash
git clone https://github.com/nicholasgalante1997/loclaude.git loclaude
cd loclaude
bun install
bun run build
`$3
`bash
With bun (direct)
bun bin/index.ts --helpWith node (built)
node bin/index.mjs --help
`$3
`bash
Test both runtimes
bun bin/index.ts doctor
node bin/index.mjs doctor
``MIT