RelayPlane Local LLM Proxy - Route requests through multiple providers
Local LLM proxy server for RelayPlane - route requests through multiple AI providers.
- 𩺠Health Endpoint ā GET /health with uptime, stats, and provider status
- ā ļø Usage Warnings ā Console and header warnings at 80%, 90%, 100% of limits
- š Response Headers ā X-RelayPlane-Daily-Usage, X-RelayPlane-Monthly-Usage, X-RelayPlane-Usage-Warning
- š° Spending Limits ā Configure limits.daily and limits.monthly with 429 when exceeded
- š·ļø Model Aliases ā rp:fast, rp:cheap, rp:best, rp:balanced shortcuts
- OpenAI-compatible API - Drop-in replacement for OpenAI SDK
- Multi-provider routing - Automatically routes to OpenAI, Anthropic, Groq, Together, OpenRouter
- Model aliases - rp:fast, rp:cheap, rp:best shortcuts
- Dry-run mode - Test routing without making API calls
- Usage tracking - Track tokens, cost, and latency
- Spending limits - Daily/monthly cost limits with warnings
- Health endpoint - /health for monitoring and uptime checks
``bash`
npm install @relayplane/proxy
Or use via the CLI:
`bash`
npm install -g @relayplane/cli
relayplane proxy start
`bashSet API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
Endpoints
$3
Health check endpoint for monitoring.
`bash
curl http://localhost:8787/health
`Response:
`json
{
"status": "ok",
"uptime": 3600,
"version": "1.1.0",
"providers": {
"openai": "configured",
"anthropic": "configured",
"groq": "not_configured",
"together": "not_configured",
"openrouter": "not_configured"
},
"requestsHandled": 150,
"requestsSuccessful": 148,
"requestsFailed": 2,
"dailyCost": 1.25,
"dailyLimit": 10.00,
"monthlyCost": 25.50,
"monthlyLimit": 100.00,
"usage": {
"inputTokens": 50000,
"outputTokens": 25000,
"totalCost": 1.25
}
}
`$3
List available models including aliases.
`bash
curl http://localhost:8787/v1/models
`$3
OpenAI-compatible chat completions.
`bash
curl http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "rp:best",
"messages": [{"role": "user", "content": "Hello!"}]
}'
`Model Aliases
| Alias | Resolves To | Provider | Use Case |
|-------|-------------|----------|----------|
|
rp:fast | llama-3.1-8b-instant | Groq | Lowest latency |
| rp:cheap | llama-3.1-8b-instant | Groq | Lowest cost |
| rp:best | claude-3-5-sonnet-20241022 | Anthropic | Highest quality |
| rp:balanced | gpt-4o-mini | OpenAI | Good balance |Dry-Run Mode
Test routing logic without making API calls:
`bash
curl http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Dry-Run: true" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
`Response:
`json
{
"dry_run": true,
"routing": {
"model": "gpt-4o",
"provider": "openai",
"endpoint": "https://api.openai.com/v1/chat/completions"
},
"estimate": {
"inputTokens": 10,
"expectedOutputTokens": 500,
"estimatedCost": 0.0125,
"currency": "USD"
},
"limits": {
"daily": 10.00,
"dailyUsed": 1.25,
"dailyRemaining": 8.75,
"monthly": 100.00,
"monthlyUsed": 25.50,
"monthlyRemaining": 74.50
}
}
`Response Headers
The proxy adds usage information to response headers:
| Header | Description |
|--------|-------------|
|
X-RelayPlane-Cost | Cost of this request |
| X-RelayPlane-Latency | Request latency in ms |
| X-RelayPlane-Daily-Usage | Daily usage (e.g., "1.25/10.00") |
| X-RelayPlane-Monthly-Usage | Monthly usage (e.g., "25.50/100.00") |
| X-RelayPlane-Usage-Warning | Warning when approaching limits (80%, 90%, 100%) |Example warning header:
`
X-RelayPlane-Usage-Warning: ā ļø You've used $8.50 of your $10 daily limit
`Console warnings are also logged when approaching limits:
`
ā ļø Daily spending at 80%: $8.00 / $10
ā ļø Daily spending at 90%: $9.00 / $10
ā ļø DAILY LIMIT REACHED: $10.00 / $10 (100%)
`Spending Limits
Configure limits in
~/.relayplane/config.json:`json
{
"limits": {
"daily": 10.00,
"monthly": 100.00
}
}
`When limits are reached, the proxy returns HTTP
429 Too Many Requests:`json
{
"error": {
"message": "Daily spending limit reached ($10.00 / $10.00)",
"code": "spending_limit_exceeded",
"type": "rate_limit_error"
}
}
`Headers included with 429 response:
-
Retry-After: 86400 (seconds until daily reset)
- X-RelayPlane-Daily-Usage: 10.00/10.00Usage Tracking
Usage is logged to
~/.relayplane/usage.jsonl:`jsonl
{"timestamp":"2024-01-15T12:00:00Z","model":"gpt-4o","provider":"openai","inputTokens":100,"outputTokens":50,"cost":0.00125,"latencyMs":1500,"success":true}
`Daily totals are tracked in
~/.relayplane/daily-usage.json:`json
{
"date": "2024-01-15",
"cost": 1.25,
"requests": 50
}
`Monthly totals are tracked in
~/.relayplane/monthly-usage.json:`json
{
"month": "2024-01",
"cost": 25.50,
"requests": 1200
}
`Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
|
RELAYPLANE_PROXY_PORT | 8787 | Port to listen on |
| RELAYPLANE_PROXY_HOST | 127.0.0.1 | Host to bind to |
| RELAYPLANE_CONFIG_DIR | ~/.relayplane | Config directory |
| OPENAI_API_KEY | - | OpenAI API key |
| ANTHROPIC_API_KEY | - | Anthropic API key |
| GROQ_API_KEY | - | Groq API key |
| TOGETHER_API_KEY | - | Together AI API key |
| OPENROUTER_API_KEY | - | OpenRouter API key |Provider Detection
Models are automatically routed to the correct provider:
| Pattern | Provider |
|---------|----------|
|
gpt-, o1- | OpenAI |
| claude-* | Anthropic |
| llama-, mixtral- | Groq |
| meta-llama/, mistralai/ | Together |
| Contains / | OpenRouter |Using with OpenAI SDK
`python
from openai import OpenAIclient = OpenAI(
base_url="http://localhost:8787/v1",
api_key="not-needed" # API keys are configured on the proxy
)
response = client.chat.completions.create(
model="rp:best", # Uses Claude 3.5 Sonnet
messages=[{"role": "user", "content": "Hello!"}]
)
``MIT