@relayplane/proxy

Local LLM proxy server for RelayPlane - route requests through multiple AI providers.

What's New in 1.1

- 🩺 Health Endpoint — GET /health with uptime, stats, and provider status
- ⚠️ Usage Warnings — Console and header warnings at 80%, 90%, 100% of limits
- 📊 Response Headers — X-RelayPlane-Daily-Usage, X-RelayPlane-Monthly-Usage, X-RelayPlane-Usage-Warning
- 💰 Spending Limits — Configure limits.daily and limits.monthly with 429 when exceeded
- 🏷️ Model Aliases — rp:fast, rp:cheap, rp:best, rp:balanced shortcuts

Features

- OpenAI-compatible API - Drop-in replacement for OpenAI SDK
- Multi-provider routing - Automatically routes to OpenAI, Anthropic, Groq, Together, OpenRouter
- Model aliases - rp:fast, rp:cheap, rp:best shortcuts
- Dry-run mode - Test routing without making API calls
- Usage tracking - Track tokens, cost, and latency
- Spending limits - Daily/monthly cost limits with warnings
- Health endpoint - /health for monitoring and uptime checks

Installation

``bash npm install @relayplane/proxy`

Or use via the CLI:

`bash npm install -g @relayplane/cli relayplane proxy start`

`Quick Start`

`bash

`Set API keys`


export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
Start the proxy

npx @relayplane/proxy
Make requests

curl http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'


Endpoints
$3
Health check endpoint for monitoring.

`bash curl http://localhost:8787/health`

Response:`json { "status": "ok", "uptime": 3600, "version": "1.1.0", "providers": { "openai": "configured", "anthropic": "configured", "groq": "not_configured", "together": "not_configured", "openrouter": "not_configured" }, "requestsHandled": 150, "requestsSuccessful": 148, "requestsFailed": 2, "dailyCost": 1.25, "dailyLimit": 10.00, "monthlyCost": 25.50, "monthlyLimit": 100.00, "usage": { "inputTokens": 50000, "outputTokens": 25000, "totalCost": 1.25 } }`

`$3`

List available models including aliases.

`bash curl http://localhost:8787/v1/models`

`$3`

OpenAI-compatible chat completions.

`bash curl http://localhost:8787/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "rp:best", "messages": [{"role": "user", "content": "Hello!"}] }'`

`Model Aliases`

| Alias | Resolves To | Provider | Use Case | |-------|-------------|----------|----------| |rp:fast| llama-3.1-8b-instant | Groq | Lowest latency | |rp:cheap| llama-3.1-8b-instant | Groq | Lowest cost | |rp:best| claude-3-5-sonnet-20241022 | Anthropic | Highest quality | |rp:balanced | gpt-4o-mini | OpenAI | Good balance |

`Dry-Run Mode`

Test routing logic without making API calls:

`bash curl http://localhost:8787/v1/chat/completions \ -H "Content-Type: application/json" \ -H "X-Dry-Run: true" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }'`

Response:`json { "dry_run": true, "routing": { "model": "gpt-4o", "provider": "openai", "endpoint": "https://api.openai.com/v1/chat/completions" }, "estimate": { "inputTokens": 10, "expectedOutputTokens": 500, "estimatedCost": 0.0125, "currency": "USD" }, "limits": { "daily": 10.00, "dailyUsed": 1.25, "dailyRemaining": 8.75, "monthly": 100.00, "monthlyUsed": 25.50, "monthlyRemaining": 74.50 } }`

`Response Headers`

The proxy adds usage information to response headers:

| Header | Description | |--------|-------------| |X-RelayPlane-Cost| Cost of this request | |X-RelayPlane-Latency| Request latency in ms | |X-RelayPlane-Daily-Usage| Daily usage (e.g., "1.25/10.00") | |X-RelayPlane-Monthly-Usage| Monthly usage (e.g., "25.50/100.00") | |X-RelayPlane-Usage-Warning | Warning when approaching limits (80%, 90%, 100%) |

Example warning header:`X-RelayPlane-Usage-Warning: ⚠️ You've used $8.50 of your $10 daily limit`

Console warnings are also logged when approaching limits:`⚠️ Daily spending at 80%: $8.00 / $10 ⚠️ Daily spending at 90%: $9.00 / $10 ⚠️ DAILY LIMIT REACHED: $10.00 / $10 (100%)`

`Spending Limits`

Configure limits in ~/.relayplane/config.json:

`json { "limits": { "daily": 10.00, "monthly": 100.00 } }`

When limits are reached, the proxy returns HTTP 429 Too Many Requests:

`json { "error": { "message": "Daily spending limit reached ($10.00 / $10.00)", "code": "spending_limit_exceeded", "type": "rate_limit_error" } }`

Headers included with 429 response: -Retry-After: 86400(seconds until daily reset) -X-RelayPlane-Daily-Usage: 10.00/10.00

`Usage Tracking`

Usage is logged to ~/.relayplane/usage.jsonl:

`jsonl {"timestamp":"2024-01-15T12:00:00Z","model":"gpt-4o","provider":"openai","inputTokens":100,"outputTokens":50,"cost":0.00125,"latencyMs":1500,"success":true}`

Daily totals are tracked in ~/.relayplane/daily-usage.json:

`json { "date": "2024-01-15", "cost": 1.25, "requests": 50 }`

Monthly totals are tracked in ~/.relayplane/monthly-usage.json:

`json { "month": "2024-01", "cost": 25.50, "requests": 1200 }`

`Environment Variables`

| Variable | Default | Description | |----------|---------|-------------| |RELAYPLANE_PROXY_PORT| 8787 | Port to listen on | |RELAYPLANE_PROXY_HOST| 127.0.0.1 | Host to bind to | |RELAYPLANE_CONFIG_DIR| ~/.relayplane | Config directory | |OPENAI_API_KEY| - | OpenAI API key | |ANTHROPIC_API_KEY| - | Anthropic API key | |GROQ_API_KEY| - | Groq API key | |TOGETHER_API_KEY| - | Together AI API key | |OPENROUTER_API_KEY | - | OpenRouter API key |

`Provider Detection`

Models are automatically routed to the correct provider:

| Pattern | Provider | |---------|----------| |gpt-, o1-| OpenAI | |claude-*| Anthropic | |llama-, mixtral-| Groq | |meta-llama/, mistralai/| Together | | Contains/ | OpenRouter |

`Using with OpenAI SDK`

`python from openai import OpenAI

client = OpenAI( base_url="http://localhost:8787/v1", api_key="not-needed" # API keys are configured on the proxy )

response = client.chat.completions.create( model="rp:best", # Uses Claude 3.5 Sonnet messages=[{"role": "user", "content": "Hello!"}] )``

License

MIT

@relayplane/proxy

Local LLM proxy server for RelayPlane - route requests through multiple AI providers.

What's New in 1.1

Features

Installation

``bash npm install @relayplane/proxy`

Or use via the CLI:

`bash npm install -g @relayplane/cli relayplane proxy start`

`Quick Start`

`bash

`Set API keys`


export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
Start the proxy

npx @relayplane/proxy
Make requests

curl http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'


Endpoints
$3
Health check endpoint for monitoring.

`bash curl http://localhost:8787/health`

`$3`

List available models including aliases.

`bash curl http://localhost:8787/v1/models`

`$3`

OpenAI-compatible chat completions.

`bash curl http://localhost:8787/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "rp:best", "messages": [{"role": "user", "content": "Hello!"}] }'`

`Model Aliases`

`Dry-Run Mode`

Test routing logic without making API calls:

`bash curl http://localhost:8787/v1/chat/completions \ -H "Content-Type: application/json" \ -H "X-Dry-Run: true" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }'`

`Response Headers`

The proxy adds usage information to response headers:

Example warning header:`X-RelayPlane-Usage-Warning: ⚠️ You've used $8.50 of your $10 daily limit`

Console warnings are also logged when approaching limits:`⚠️ Daily spending at 80%: $8.00 / $10 ⚠️ Daily spending at 90%: $9.00 / $10 ⚠️ DAILY LIMIT REACHED: $10.00 / $10 (100%)`

`Spending Limits`

Configure limits in ~/.relayplane/config.json:

`json { "limits": { "daily": 10.00, "monthly": 100.00 } }`

When limits are reached, the proxy returns HTTP 429 Too Many Requests:

`json { "error": { "message": "Daily spending limit reached ($10.00 / $10.00)", "code": "spending_limit_exceeded", "type": "rate_limit_error" } }`

Headers included with 429 response: -Retry-After: 86400(seconds until daily reset) -X-RelayPlane-Daily-Usage: 10.00/10.00

`Usage Tracking`

Usage is logged to ~/.relayplane/usage.jsonl:

`jsonl {"timestamp":"2024-01-15T12:00:00Z","model":"gpt-4o","provider":"openai","inputTokens":100,"outputTokens":50,"cost":0.00125,"latencyMs":1500,"success":true}`

Daily totals are tracked in ~/.relayplane/daily-usage.json:

`json { "date": "2024-01-15", "cost": 1.25, "requests": 50 }`

Monthly totals are tracked in ~/.relayplane/monthly-usage.json:

`json { "month": "2024-01", "cost": 25.50, "requests": 1200 }`

`Environment Variables`

`Provider Detection`

Models are automatically routed to the correct provider:

`Using with OpenAI SDK`

`python from openai import OpenAI

client = OpenAI( base_url="http://localhost:8787/v1", api_key="not-needed" # API keys are configured on the proxy )

response = client.chat.completions.create( model="rp:best", # Uses Claude 3.5 Sonnet messages=[{"role": "user", "content": "Hello!"}] )``

License

MIT