A high-performance LLM proxy router.
npm install @helicone/ai-gateway




The fastest, lightest, and easiest-to-integrate AI Gateway on the market.
_Built by the team at Helicone, open-sourced for the community._
π Quick Start β’ π Docs β’ π¬ Discord β’ π Website
---
Open-source, lightweight, and built on Rust.
Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.
The NGINX of LLMs.
---
1. Set up your .env file with your PROVIDER_API_KEYs
``bash`
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
2. Run locally in your terminal
`bash`
npx @helicone/ai-gateway@latest
3. Make your requests using any OpenAI SDK:
`python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/ai",
api_key="placeholder-api-key" # Gateway handles API keys
)
That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.
_-- For advanced config, check out our configuration guide and the providers we support._
---
Why Helicone AI Gateway?
#### π Unified interface
Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrationsβuse one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.
#### β‘ Smart provider selection
Load balance to always hit the fastest, cheapest, or most reliable option. Built-in strategies include latency-based P2C + PeakEWMA, weighted distribution, and cost optimization. Always aware of provider uptime and rate limits.
#### π° Control your spending
Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.
#### π Improve performance
Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.
#### π Simplified tracing
Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.
#### βοΈ One-click deployment
Deploy in seconds to your own infrastructure by using our Docker or binary download following our deployment guides.
https://github.com/user-attachments/assets/ed3a9bbe-1c4a-47c8-98ec-2bb4ff16be1f
---
β‘ Scalable for production
| Metric | Helicone AI Gateway | Typical Setup |
| ---------------- | ------------------- | ------------- |
| P95 Latency | <10ms | ~60-100ms |
| Memory Usage | ~64MB | ~512MB |
| Requests/sec | ~2,000 | ~500 |
| Binary Size | ~15MB | ~200MB |
| Cold Start | ~100ms | ~2s |
_Note: These are preliminary performance metrics. See benchmarks/README.md for detailed benchmarking methodology and results._
---
π₯ Demo
https://github.com/user-attachments/assets/dd6b6df1-0f5c-43d4-93b6-3cc751efb5e1
---
ποΈ How it works
`
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Your App βββββΆβ Helicone AI βββββΆβ LLM Providers β
β β β Gateway β β β
β OpenAI SDK β β β β β’ OpenAI β
β (any language) β β β’ Load Balance β β β’ Anthropic β
β β β β’ Rate Limit β β β’ AWS Bedrock β
β β β β’ Cache β β β’ Google Vertex β
β β β β’ Trace β β β’ 20+ more β
β β β β’ Fallbacks β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Helicone β
β Observability β
β β
β β’ Dashboard β
β β’ Observability β
β β’ Monitoring β
β β’ Debugging β
βββββββββββββββββββ
`---
βοΈ Custom configuration
$3
Include your
PROVIDER_API_KEYs in your .env file.`bash
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_API_KEY=sk-...
`$3
_Note: This is a sample
config.yaml file. Please refer to our configuration guide for the full list of options, examples, and defaults._
_See our full provider list here._`yaml
helicone: # Include your HELICONE_API_KEY in your .env file
features: allcache-store:
in-memory: {}
global: # Global settings for all routers
cache:
directive: "max-age=3600, max-stale=1800"
routers:
your-router-name: # Single router configuration
load-balance:
chat:
strategy: latency
targets:
- openai
- anthropic
rate-limit:
per-api-key:
capacity: 1000
refill-frequency: 1m # 1000 requests per minute
`$3
`bash
npx @helicone/ai-gateway@latest --config config.yaml
`$3
`python
from openai import OpenAIclient = OpenAI(
base_url="http://localhost:8080/router/your-router-name",
api_key="placeholder-api-key" # Gateway handles API keys
)
Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
`---
π Migration guide
$3
`diff
from openai import OpenAIclient = OpenAI(
- api_key=os.getenv("OPENAI_API_KEY")
+ api_key="placeholder-api-key" # Gateway handles API keys
+ base_url="http://localhost:8080/router/your-router-name"
)
No other changes needed!
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
`$3
`diff typescript
import { OpenAI } from "openai";const client = new OpenAI({
- apiKey: os.getenv("OPENAI_API_KEY")
+ apiKey: "placeholder-api-key", // Gateway handles API keys
+ baseURL: "http://localhost:8080/router/your-router-name",
});
const response = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});
``
---
- π Full Documentation - Complete guides and API reference
- π Quickstart Guide - Get up and running in 1 minute
- π¬ Advanced Configurations - Configuration reference & examples
- π¬ Discord Server - Our community of passionate AI engineers
- π GitHub Discussions - Q&A and feature requests
- π¦ Twitter - Latest updates and announcements
- π§ Newsletter - Tips and tricks to deploying AI applications
- π« Report bugs: Github issues
- πΌ Enterprise Support: Book a discovery call with our team
---
The Helicone AI Gateway is licensed under the Apache License - see the file for details.
---
Made with β€οΈ by Helicone.