Distributed GPU Inference Worker - Share idle GPU computing power for LLM and image generation
npm install gpu-workerDistributed GPU Inference Worker - Share your idle GPU computing power for LLM inference and image generation.
- Easy Setup: Single command installation with automatic Python environment management
- Multiple Engines: Support for native Transformers, vLLM, SGLang backends
- LLM Inference: Run Qwen, Llama, GLM, DeepSeek and other popular models
- Image Generation: Support FLUX, Stable Diffusion XL and more
- Cross-Platform: Works on Windows, Linux, and macOS
- Auto Configuration: Interactive setup wizard
``bashInteractive menu
npx gpu-worker
$3
`bash
npm install -g gpu-worker
gpu-worker configure
gpu-worker start
`$3
If your server already has GPU dependencies installed globally, you can skip the virtual environment and auto-install:
`bash
npx gpu-worker configure --use-system-python
npx gpu-worker start --use-system-python
`You can also keep a venv but skip dependency installation:
`bash
npx gpu-worker configure --skip-install
`On first run without a venv, the interactive CLI will prompt you to choose the mode.
Requirements
- Node.js: >= 16.0.0
- Python: >= 3.9
- GPU: NVIDIA GPU with CUDA 11.8+ (optional, for GPU inference)
- RAM: 16GB+ recommended
- Storage: 50GB+ for model storage
CUDA note: the installer uses
nvidia-smi to detect CUDA and selects cu124 / cu121 / cu118 for PyTorch.
CUDA 12.2/12.3 use cu121. Other versions fall back to CPU builds.Configuration
The worker can be configured via:
1. Interactive wizard:
gpu-worker configure
2. Environment variables: Copy .env.example to .env
3. YAML config file: Edit config.yaml$3
| Option | Environment Variable | Description |
|--------|---------------------|-------------|
| Server URL |
GPU_SERVER_URL | Central server address |
| Worker Name | GPU_WORKER_NAME | Display name for this worker |
| Region | GPU_REGION | Geographic region (e.g., asia-east) |
| Supported Types | GPU_SUPPORTED_TYPES | Task types: llm, image_gen |
| LLM Model | GPU_LLM_MODEL | HuggingFace model ID |Supported Models
$3
| Model | VRAM Required | Model ID |
|-------|---------------|----------|
| Qwen2.5-7B | 16GB |
Qwen/Qwen2.5-7B-Instruct |
| Llama-3.1-8B | 18GB | meta-llama/Llama-3.1-8B-Instruct |
| GLM-4-9B | 20GB | THUDM/glm-4-9b-chat |$3
| Model | VRAM Required | Model ID |
|-------|---------------|----------|
| FLUX.1-schnell | 24GB |
black-forest-labs/FLUX.1-schnell |
| SDXL | 12GB | stabilityai/stable-diffusion-xl-base-1.0 |High-Performance Backends
For production use, install optional high-performance backends:
`bash
SGLang (recommended for high throughput)
pip install sglang[all]vLLM (alternative)
pip install vllm
`Architecture
`
┌─────────────────┐ ┌─────────────────┐
│ Central Server │◄────│ GPU Worker │
│ (Scheduler) │ │ (This Package) │
└─────────────────┘ └─────────────────┘
│ │
│ ▼
│ ┌───────────────┐
│ │ GPU/CPU │
│ │ Inference │
└───────────────┤ Engine │
└───────────────┘
``MIT License - see LICENSE for details.