Kubernetes-native GPU Autoscaler: maximize utilization, minimize cloud costs
npm install gpuautoscaler

> Maximize GPU cluster utilization, minimize cloud costs
GPU Autoscaler is an open-source Kubernetes-native system that reduces GPU infrastructure costs by 30-50% through intelligent workload packing, multi-tenancy support (MIG/MPS/time-slicing), and cost-optimized autoscaling.
Organizations running GPU workloads on Kubernetes waste 40-60% of GPU capacity and overspend by millions annually due to:
- Allocation vs. Utilization Gap: Pods request full GPUs but use <50% of resources
- Inefficient Multi-Tenancy: Small inference jobs monopolize entire GPUs when they could share
- Suboptimal Autoscaling: Kubernetes doesn't understand GPU-specific metrics (VRAM, SM utilization)
- Lack of Cost Attribution: No per-team/experiment cost tracking
- Kubernetes 1.25+ cluster with GPU nodes
- NVIDIA GPU drivers (470.x+ recommended)
- NVIDIA device plugin installed
- Helm 3.x
``bashAdd the GPU Autoscaler Helm repository
helm repo add gpu-autoscaler https://gpuautoscaler.github.io/charts
helm repo update
$3
`bash
Port-forward Grafana
kubectl port-forward -n gpu-autoscaler-system svc/gpu-autoscaler-grafana 3000:80Open browser to http://localhost:3000
Default credentials: admin / (see secret)
`$3
`bash
Install the CLI tool
curl -sSL https://gpuautoscaler.io/install.sh | bashCheck cluster GPU utilization
gpu-autoscaler statusAnalyze waste and get recommendations
gpu-autoscaler optimizeView cost breakdown
gpu-autoscaler cost --namespace ml-team --last 7d
`$3
`bash
Enable cost tracking with cloud provider credentials
helm upgrade gpu-autoscaler gpu-autoscaler/gpu-autoscaler \
--namespace gpu-autoscaler-system \
--set cost.enabled=true \
--set cost.provider=aws \
--set cost.region=us-west-2Configure TimescaleDB for historical data (optional)
helm upgrade gpu-autoscaler gpu-autoscaler/gpu-autoscaler \
--set cost.timescaledb.enabled=true \
--set cost.timescaledb.host=timescaledb.default.svc.cluster.local \
--set cost.timescaledb.database=gpucostCreate a cost budget with alerts
cat < apiVersion: v1alpha1.gpuautoscaler.io
kind: CostBudget
metadata:
name: ml-team-monthly-budget
spec:
scope:
type: namespace
namespaceSelector:
matchLabels:
team: ml-team
budget:
amount: 50000.0 # $50K/month
period: 30d
alerts:
- threshold: 80
channels:
- type: slack
config:
webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
- threshold: 100
channels:
- type: email
config:
to: ml-team-leads@company.com
enforcement:
mode: throttle # Options: alert, throttle, block
gracePeriod: 2h
throttleConfig:
maxSpotInstances: 5
blockOnDemand: true
EOFCreate cost attribution tracking
cat < apiVersion: v1alpha1.gpuautoscaler.io
kind: CostAttribution
metadata:
name: ml-team-attribution
spec:
scope:
type: namespace
namespaceSelector:
matchLabels:
team: ml-team
trackBy:
- namespace
- label:experiment-id
- label:user
reportingPeriod: 24h
EOFView cost reports
gpu-autoscaler cost report --format table
gpu-autoscaler cost report --format json > cost-report.json
gpu-autoscaler cost roi --last 30d
`š Results
Organizations using GPU Autoscaler report:
- 40-60% cost reduction through better utilization and spot instances
- GPU utilization: 40% ā 75%+ through intelligent packing
- 2.5+ jobs per GPU (up from 1.2) through MIG/MPS sharing
- $500K-$2M annual savings on GPU infrastructure
šļø Architecture
`
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā GPU Kubernetes Cluster ā
ā āāāāāāāāāāāāāā āāāāāāāāāāāāāā āāāāāāāāāāāāāā ā
ā ā GPU Node ā ā GPU Node ā ā GPU Node ā ā
ā ā + DCGM ā ā + DCGM ā ā + DCGM ā ā
ā ā + ML Pods ā ā + ML Pods ā ā + ML Pods ā ā
ā āāāāāāāāāāāāāā āāāāāāāāāāāāāā āāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā (metrics)
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā GPU Autoscaler Control Plane ā
ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
ā ā Prometheus āā ā Controller āā ā Admission ā ā
ā ā (Metrics) ā ā (Packing) ā ā Webhook ā ā
ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
ā ā ā ā ā
ā ā¼ ā¼ ā¼ ā
ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
ā ā Grafana ā ā Autoscaler ā ā Cost ā ā
ā ā (Dashboards) ā ā Engine ā ā Calculator ā ā
ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
`š Documentation
- Installation Guide
- Architecture Overview
- Configuration Reference
- User Guide
- Troubleshooting
- API Reference
- Contributing Guide
š¤ Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
$3
`bash
Clone the repository
git clone https://github.com/gpuautoscaler/gpuautoscaler.git
cd gpuautoscalerInstall dependencies
make install-depsRun tests
make testBuild controller
make buildRun locally (requires Kind with GPU support)
make dev-cluster
`š¦ Release Guide
GPU Autoscaler uses automated semantic versioning with svu and GoReleaser. Releases are automatically created when commits are pushed to the
main branch.$3
Follow Conventional Commits to trigger the correct version bump:
Patch Release (v1.0.x) - Bug fixes and minor changes:
`
fix: resolve memory leak in cost tracker
fix(api): handle nil pointer in budget controller
`Minor Release (v1.x.0) - New features (backward compatible):
`
feat: add support for Azure GPU pricing
feat(cost): implement custom alert templates
`Major Release (vx.0.0) - Breaking changes:
`
feat!: redesign CostBudget API with new fields
fix!: change default enforcement mode to throttleBREAKING CHANGE: CostBudget.spec.limit renamed to CostBudget.spec.budget.amount
`$3
1. Merge PR to main: All commits to
main trigger the release workflow
2. Automatic tagging: GitHub Actions calculates the next version using svu based on commit messages
3. Build & publish: GoReleaser creates:
- GitHub Release with binaries (Linux, macOS, Windows)
- Changelog from commit messages
- Checksums for all artifacts$3
`bash
Create and push a tag manually
git tag v1.0.5
git push origin v1.0.5The release workflow will trigger automatically
`$3
Each release includes:
-
gpu-autoscaler-controller - Kubernetes controller binary (Linux/macOS, amd64/arm64)
- gpu-autoscaler` - CLI tool (Linux/macOS/Windows, amd64/arm64)Apache License 2.0 - see LICENSE file for details.
- NVIDIA for DCGM and GPU technologies (MIG, MPS)
- Kubernetes community for device plugin and scheduler frameworks
- CNCF projects: Prometheus, Grafana, Karpenter
- Documentation
- Slack Community
- GitHub Issues
- Roadmap
Current Phase: Phase 4 - Cost Management ā
All four phases are now complete:
- ā
Phase 1: Observability Foundation
- ā
Phase 2: Intelligent Packing
- ā
Phase 3: Autoscaling Engine
- ā
Phase 4: Cost Management
See our roadmap for future enhancements and planned features.
---
Made with ā¤ļø by the GPU Autoscaler community