loclaude

Read the docs

Claude Code with Local LLMs

Stop burning through Claude API usage limits. Run Claude Code's powerful agentic workflow with local Ollama models on your own hardware.

> Requires ollama v0.14.2 or higher

Zero API costs. No rate limits. Complete privacy.

![npm version](https://www.npmjs.com/package/loclaude)
![License: MIT](https://opensource.org/licenses/MIT)

Quick Start • Why loclaude? • Installation • FAQ

---

Why loclaude?

$3

- No Rate Limits: Use Claude Code as much as you want
- Privacy: Your code never leaves your machine
- Cost Control: Use your own hardware, pay for electricity not tokens
- Offline Capable: Work without internet (after model download)
- GPU or CPU: Works with NVIDIA GPUs or CPU-only systems

$3

loclaude provides:

- One-command setup for Ollama + Open WebUI containers
- Smart model management with auto-loading
- GPU auto-detection with CPU fallback
- Project scaffolding with Docker configs

Installation

``bash

`With npm (requires Node.js 18+)`


npm install -g loclaude
With bun (faster, recommended)

bun install -g loclaude # use bun-loclaude for commands


$3
| Solution | Cost | Speed | Privacy | Limits |
|----------|------|-------|---------|--------|
| loclaude | Free after setup | Fast (GPU) | 100% local | None |
| Claude API/Web | $20-200+/month | Fast | Cloud-based | Rate limited |
| GitHub Copilot | $10-20/month | Fast | Cloud-based | Context limited |
| Cursor/Codeium | $20+/month | Fast | Cloud-based | Usage limits |
loclaude gives you the utility of Ollama with the convenience of a managed solution for claude code integration.
Quick Start (5 Minutes)

`bash

`1. Install loclaude`


npm install -g loclaude
2. Install Claude Code (if you haven't already)

npm install -g @anthropic-ai/claude-code
3. Setup your project (auto-detects GPU)

loclaude init
4. Start Ollama container

loclaude docker-up
5. Pull a model (choose based on your hardware)

loclaude models-pull qwen3-coder:30b    # GPU with 16GB+ VRAM
OR

loclaude models-pull qwen2.5-coder:7b   # CPU or limited VRAM
6. Run Claude Code with unlimited local LLM

loclaude run


That's it! You now have unlimited Claude Code sessions with local models.
Prerequisites
Required:

- Docker with Docker Compose v2 - Claude Code CLI (npm install -g @anthropic-ai/claude-code)

Optional (for GPU acceleration):

- NVIDIA GPU with 16GB+ VRAM (RTX 3090, 4090, A5000, etc.) - NVIDIA Container Toolkit

CPU-only systems work fine! Use --no-gpu flag during init and smaller models.

Check your setup:

`bash loclaude doctor`

`Features`

`$3`

When you run loclaude run, it automatically:

1. Checks if your selected model is loaded in Ollama 2. If not loaded, warms up the model with a 10-minute keep-alive (Configurable through env vars) 3. Shows[loaded] indicator in model selection for running models

`$3`

loclaude init automatically detects NVIDIA GPUs and configures the appropriate Docker setup:

- GPU detected: Uses runtime: nvidiaand CUDA-enabled images - No GPU: Uses CPU-only configuration with smaller default models

`Commands`

`$3`

`bash loclaude run # Interactive model selection loclaude run -m qwen3-coder:30b # Use specific model loclaude run -- --help # Pass args to claude`

`$3`

`bash loclaude init # Auto-detect GPU, scaffold project loclaude init --gpu # Force GPU mode loclaude init --no-gpu # Force CPU-only mode loclaude init --force # Overwrite existing files loclaude init --no-webui # Skip Open WebUI in compose file`

`$3`

`bash loclaude docker-up # Start containers (detached) loclaude docker-up --no-detach # Start in foreground loclaude docker-down # Stop containers loclaude docker-status # Show container status loclaude docker-logs # Show logs loclaude docker-logs --follow # Follow logs loclaude docker-restart # Restart containers`

`$3`

`bash loclaude models # List installed models loclaude models-pull # Pull a model loclaude models-rm # Remove a model loclaude models-show # Show model details loclaude models-run # Run model interactively (ollama CLI)`

`$3`

`bash loclaude doctor # Check prerequisites loclaude config # Show current configuration loclaude config-paths # Show config file search paths`

`Recommended Models`

`$3`

| Model | Size | Speed | Quality | Best For | |-------|------|-------|---------|----------| |qwen3-coder:30b| ~17 GB | ~50-100 tok/s | Excellent | Most coding tasks, refactoring, debugging | |deepseek-coder:33b | ~18 GB | ~40-80 tok/s | Excellent | Code understanding, complex logic |

Recommendation: Start with qwen3-coder:30b for the best balance of speed and quality.

`$3`

| Model | Size | Speed | Quality | Best For | |-------|------|-------|---------|----------| |qwen2.5-coder:7b| ~4 GB | ~10-20 tok/s | Good | Code completion, simple refactoring | |deepseek-coder:6.7b| ~4 GB | ~10-20 tok/s | Good | Understanding existing code | |llama3.2:3b | ~2 GB | ~15-30 tok/s | Fair | Quick edits, file operations |

`Configuration`

loclaude supports configuration via files and environment variables.

`$3`

Config files are loaded in priority order:

1. ./.loclaude/config.json(project-local) 2.~/.config/loclaude/config.json (user global)

Example config:

`json { "ollama": { "url": "http://localhost:11434", "defaultModel": "qwen3-coder:30b" }, "docker": { "composeFile": "./docker-compose.yml", "gpu": true }, "claude": { "extraArgs": ["--verbose"] } }`

`$3`

| Variable | Description | Default | |----------|-------------|---------| |OLLAMA_URL | Ollama API endpoint | http://localhost:11434| |OLLAMA_MODEL | Default model name | qwen3-coder:30b| |LOCLAUDE_COMPOSE_FILE | Path to docker-compose.yml | ./docker-compose.yml| |LOCLAUDE_GPU | Enable GPU (true/false) | true |

`$3`

Configuration is merged in this order (highest priority first):

1. CLI arguments 2. Environment variables 3. Project config (./.loclaude/config.json) 4. User config (~/.config/loclaude/config.json) 5. Default values

`Service URLs`

When containers are running:

| Service | URL | Description | |---------|-----|-------------| | Ollama API | | LLM inference API | | Open WebUI | | Chat interface |

`Project Structure`

After running loclaude init:

`. ├── .claude/ │ └── CLAUDE.md # Claude Code instructions ├── .loclaude/ │ └── config.json # Loclaude configuration ├── models/ # Ollama model storage (gitignored) ├── docker-compose.yml # Container definitions (GPU or CPU mode) ├── mise.toml # Task runner configuration └── README.md`

`Using with mise`

The init command creates a mise.toml with convenient task aliases:

`bash mise run up # loclaude docker-up mise run down # loclaude docker-down mise run claude # loclaude run mise run pull # loclaude models-pull mise run doctor # loclaude doctor`

`FAQ`

`$3`

Yes! Once you have models downloaded, you can run as many sessions as you want with zero additional cost.

`$3`

30B parameter models (qwen3-coder:30b) are comparable to GPT-3.5 and work okay for most coding tasks. Larger models have a bit more success. Claude API is still better, but this allows for continuing work when you have hit that pesky usage limit.

`$3`

No, but highly recommended. CPU-only mode works with smaller models at ~10-20 tokens/sec. A GPU (16GB+ VRAM) gives you 50-100 tokens/sec with larger, better models.

`$3`

Absolutely! Keep using Claude API for critical tasks, use loclaude for everything else to save money and avoid limits.

`Troubleshooting`

`$3`

`bash loclaude doctor`

This verifies:

- Docker and Docker Compose installation - NVIDIA GPU detection (optional) - NVIDIA Container Toolkit (optional) - Claude Code CLI - Ollama API connectivity

`$3`

`bash

`View logs`


loclaude docker-logs --follow
Restart containers

loclaude docker-restart
Full reset

loclaude docker-down && loclaude docker-up


$3
If Claude Code can't connect to Ollama:

1. Verify Ollama is running: loclaude docker-status2. Check the API:curl http://localhost:11434/api/tags3. Verify your config:loclaude config

`$3`

If you have a GPU but it's not detected:

1. Check NVIDIA drivers: nvidia-smi2. Test Docker GPU access:docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi3. Install NVIDIA Container Toolkit if missing 4. Re-runloclaude init --gpu to force GPU mode

`$3`

If inference is slow on CPU:

1. Use smaller, quantized models: qwen2.5-coder:7b, llama3.2:3b2. Expect ~10-20 tokens/sec on modern CPUs 3. Consider cloud models via Ollama:glm-4.7:cloud

`Getting Help`

- Issues/Bugs: GitHub Issues - Questions: GitHub Discussions - Documentation: Runloclaude --helpor check this README - System Check: Runloclaude doctor to diagnose problems

`Development`

`$3`

`bash git clone https://github.com/nicholasgalante1997/loclaude.git loclaude cd loclaude bun install bun run build`

`$3`

`bash

`With bun (direct)`


bun bin/index.ts --help
With node (built)

node bin/index.mjs --help

$3

`bash

`Test both runtimes`


bun bin/index.ts doctor
node bin/index.mjs doctor

License

MIT

loclaude

Read the docs

Claude Code with Local LLMs

Stop burning through Claude API usage limits. Run Claude Code's powerful agentic workflow with local Ollama models on your own hardware.

> Requires ollama v0.14.2 or higher

Zero API costs. No rate limits. Complete privacy.

![npm version](https://www.npmjs.com/package/loclaude)
![License: MIT](https://opensource.org/licenses/MIT)

Quick Start • Why loclaude? • Installation • FAQ

---

Why loclaude?

$3

loclaude provides:

- One-command setup for Ollama + Open WebUI containers
- Smart model management with auto-loading
- GPU auto-detection with CPU fallback
- Project scaffolding with Docker configs

Installation

``bash

`With npm (requires Node.js 18+)`


npm install -g loclaude
With bun (faster, recommended)

bun install -g loclaude # use bun-loclaude for commands


$3
| Solution | Cost | Speed | Privacy | Limits |
|----------|------|-------|---------|--------|
| loclaude | Free after setup | Fast (GPU) | 100% local | None |
| Claude API/Web | $20-200+/month | Fast | Cloud-based | Rate limited |
| GitHub Copilot | $10-20/month | Fast | Cloud-based | Context limited |
| Cursor/Codeium | $20+/month | Fast | Cloud-based | Usage limits |
loclaude gives you the utility of Ollama with the convenience of a managed solution for claude code integration.
Quick Start (5 Minutes)

`bash

`1. Install loclaude`


npm install -g loclaude
2. Install Claude Code (if you haven't already)

npm install -g @anthropic-ai/claude-code
3. Setup your project (auto-detects GPU)

loclaude init
4. Start Ollama container

loclaude docker-up
5. Pull a model (choose based on your hardware)

loclaude models-pull qwen3-coder:30b    # GPU with 16GB+ VRAM
OR

loclaude models-pull qwen2.5-coder:7b   # CPU or limited VRAM
6. Run Claude Code with unlimited local LLM

loclaude run


That's it! You now have unlimited Claude Code sessions with local models.
Prerequisites
Required:

- Docker with Docker Compose v2 - Claude Code CLI (npm install -g @anthropic-ai/claude-code)

Optional (for GPU acceleration):

- NVIDIA GPU with 16GB+ VRAM (RTX 3090, 4090, A5000, etc.) - NVIDIA Container Toolkit

CPU-only systems work fine! Use --no-gpu flag during init and smaller models.

Check your setup:

`bash loclaude doctor`

`Features`

`$3`

When you run loclaude run, it automatically:

`$3`

loclaude init automatically detects NVIDIA GPUs and configures the appropriate Docker setup:

- GPU detected: Uses runtime: nvidiaand CUDA-enabled images - No GPU: Uses CPU-only configuration with smaller default models

`Commands`

`$3`

`bash loclaude run # Interactive model selection loclaude run -m qwen3-coder:30b # Use specific model loclaude run -- --help # Pass args to claude`

`$3`

`bash loclaude doctor # Check prerequisites loclaude config # Show current configuration loclaude config-paths # Show config file search paths`

`Recommended Models`

`$3`

Recommendation: Start with qwen3-coder:30b for the best balance of speed and quality.

`$3`

`Configuration`

loclaude supports configuration via files and environment variables.

`$3`

Config files are loaded in priority order:

1. ./.loclaude/config.json(project-local) 2.~/.config/loclaude/config.json (user global)

Example config:

`$3`

Configuration is merged in this order (highest priority first):

1. CLI arguments 2. Environment variables 3. Project config (./.loclaude/config.json) 4. User config (~/.config/loclaude/config.json) 5. Default values

`Service URLs`

When containers are running:

| Service | URL | Description | |---------|-----|-------------| | Ollama API | | LLM inference API | | Open WebUI | | Chat interface |

`Project Structure`

After running loclaude init:

`Using with mise`

The init command creates a mise.toml with convenient task aliases:

`bash mise run up # loclaude docker-up mise run down # loclaude docker-down mise run claude # loclaude run mise run pull # loclaude models-pull mise run doctor # loclaude doctor`

`FAQ`

`$3`

Yes! Once you have models downloaded, you can run as many sessions as you want with zero additional cost.

`$3`

No, but highly recommended. CPU-only mode works with smaller models at ~10-20 tokens/sec. A GPU (16GB+ VRAM) gives you 50-100 tokens/sec with larger, better models.

`$3`

Absolutely! Keep using Claude API for critical tasks, use loclaude for everything else to save money and avoid limits.

`Troubleshooting`

`$3`

`bash loclaude doctor`

This verifies:

- Docker and Docker Compose installation - NVIDIA GPU detection (optional) - NVIDIA Container Toolkit (optional) - Claude Code CLI - Ollama API connectivity

`$3`

`bash

`View logs`


loclaude docker-logs --follow
Restart containers

loclaude docker-restart
Full reset

loclaude docker-down && loclaude docker-up


$3
If Claude Code can't connect to Ollama:

1. Verify Ollama is running: loclaude docker-status2. Check the API:curl http://localhost:11434/api/tags3. Verify your config:loclaude config

`$3`

If you have a GPU but it's not detected:

`$3`

If inference is slow on CPU:

1. Use smaller, quantized models: qwen2.5-coder:7b, llama3.2:3b2. Expect ~10-20 tokens/sec on modern CPUs 3. Consider cloud models via Ollama:glm-4.7:cloud

`Getting Help`

- Issues/Bugs: GitHub Issues - Questions: GitHub Discussions - Documentation: Runloclaude --helpor check this README - System Check: Runloclaude doctor to diagnose problems

`Development`

`$3`

`bash git clone https://github.com/nicholasgalante1997/loclaude.git loclaude cd loclaude bun install bun run build`

`$3`

`bash

`With bun (direct)`


bun bin/index.ts --help
With node (built)

node bin/index.mjs --help

$3

`bash

`Test both runtimes`


bun bin/index.ts doctor
node bin/index.mjs doctor

License

MIT