AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking
npm install compress-lightreachAI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking



Compress Light Reach is a Node.js/TypeScript SDK that provides intelligent model routing and prompt compression for LLM applications, reducing token usage and costs while maintaining quality.
- Intelligent Model Routing: Automatically selects optimal model based on quality requirements (HLE) and available provider keys
- Token-aware Compression: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
- Lossless: Perfect decompression guaranteed
- Output Compression: Optional model output compression support
- Cloud API: Uses Light Reach's cloud service for compression and routing
- Multi-provider Support: OpenAI, Anthropic, Google, DeepSeek, Moonshot
- TypeScript: Full TypeScript support with type definitions
- BYOK: Provider API keys managed securely in dashboard (never passed through SDK)
``bash`
npm install compress-lightreach
or
`bash`
yarn add compress-lightreach
The SDK uses intelligent model routing and targets POST /api/v2/complete.
- Authenticate with your LightReach API key (env var PCOMPRESLR_API_KEY or LIGHTREACH_API_KEY)
- Manage provider keys (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
- System automatically selects optimal model based on your requirements
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing in simple terms.' },
],
desired_hle: 30, // Quality ceiling (0-100). Current SOTA is ~40%.
});
console.log(result.decompressed_response);
console.log(Selected: ${result.routing_info?.selected_model});Token savings: ${result.compression_stats.token_savings}
console.log();`
LightReach also exposes a strict OpenAI-compatible surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.
- Cursor base URL: https://compress.lightreach.io/v1/cursorhttps://compress.lightreach.io/v1
- Generic OpenAI-compatible base URL: GET /models
- Endpoints: , POST /chat/completionslightreach
- Model id:
Example (cURL):
`bash`
curl -sS https://compress.lightreach.io/v1/chat/completions \
-H "Authorization: Bearer lr_your_lightreach_key" \
-H "Content-Type: application/json" \
-d '{
"model": "lightreach",
"messages": [{"role":"user","content":"Say hello"}],
"stream": true
}'
`typescript
const result = await client.complete({
messages: [{ role: 'user', content: 'Generate a long report...' }],
desired_hle: 25,
compress_output: true,
});
console.log(result.decompressed_response);
`
The system automatically selects the optimal model based on quality requirements and your available provider keys:
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
// Cross-provider optimization: system picks cheapest model meeting your quality bar
const result = await client.complete({
messages: [{ role: 'user', content: 'Explain quantum computing' }],
desired_hle: 30, // Quality ceiling (0-100). Current SOTA is ~40%.
});
// Check what was selected
console.log(result.routing_info?.selected_model); // e.g., "gpt-4o-mini"
console.log(result.routing_info?.selected_provider); // e.g., "openai"
console.log(result.routing_info?.model_hle); // e.g., 32.5
console.log(result.routing_info?.model_price_per_million); // e.g., 0.15
`
Optionally constrain to a specific provider:
`typescript`
// Only use OpenAI models, but pick the cheapest one meeting HLE 35
const result = await client.complete({
messages: [{ role: 'user', content: 'Write a poem' }],
llm_provider: 'openai', // Optional: constrain to one provider
desired_hle: 35,
});
Admins can set quality ceilings via the dashboard (global or per-tag) to control costs. Your desired_hle is a preference; if it exceeds an admin-set ceiling, the request will silently clamp to the ceiling and proceed.
`typescript
// Admin set global HLE ceiling to 30%
// Requesting above the ceiling will be clamped to 30 (no error)
const result = await client.complete({
messages: [{ role: 'user', content: 'Process payment' }],
desired_hle: 35, // Will be clamped down to 30
tags: { env: 'production' },
});
// Correct usage: request within ceiling
const result = await client.complete({
messages: [{ role: 'user', content: 'Process payment' }],
desired_hle: 25, // OK: below ceiling of 30
tags: { env: 'production' },
});
// Check if your HLE was lowered by an admin ceiling
if (result.routing_info?.hle_clamped) {
console.log(HLE lowered from ${result.routing_info.requested_hle} +to ${result.routing_info.effective_hle}
+by ${result.routing_info.hle_source}-level ceiling
);`
}
Configure per-role compression settings:
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [{ role: 'user', content: 'Hello!' }],
desired_hle: 30,
compress: true,
compress_output: false,
compression_config: {
compress_system: false,
compress_user: true,
compress_assistant: false,
compress_only_last_n_user: 1,
},
temperature: 0.7,
max_tokens: 1000,
tags: { env: 'production' },
});
console.log(result.decompressed_response);
console.log(Model used: ${result.routing_info?.selected_model});`
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
// Compress text without making an LLM call
const compressed = await client.compress(
"Your text with repeated content here...",
"gpt-4", // Model for tokenization
{ env: 'dev' } // Optional tags
);
console.log(compressed.llm_format);
console.log(Compression ratio: ${compressed.compression_ratio});
// Decompress later
const decompressed = await client.decompress(compressed.llm_format);
console.log(decompressed.decompressed);
`
`bashSet your API key
export PCOMPRESLR_API_KEY=your-api-key
API Reference
$3
Main API client for intelligent model routing and compression.
#### Constructor
`typescript
new PcompresslrAPIClient(apiKey?: string, apiUrl?: string, timeout?: number)
`Parameters:
-
apiKey (string, optional): LightReach API key. Falls back to LIGHTREACH_API_KEY or PCOMPRESLR_API_KEY env vars.
- apiUrl (string, optional): Override base API URL. Falls back to PCOMPRESLR_API_URL env var. Default: https://api.compress.lightreach.io
- timeout (number, optional): Request timeout in milliseconds. Default: 900000 (15 minutes)#### Methods
#####
complete(request: CompleteV2Request): PromiseMessages-first completion with intelligent routing (POST
/api/v2/complete).Request Parameters (
CompleteV2Request):| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
|
messages | Message[] | required | Conversation history with role and content |
| llm_provider | 'openai' \| 'anthropic' \| 'google' \| 'deepseek' \| 'moonshot' | — | Optional provider constraint. Omit for cross-provider optimization |
| desired_hle | number | — | Quality ceiling (0-100). If above an admin ceiling, it is clamped down |
| compress | boolean | true | Whether to compress messages |
| compress_output | boolean | false | Whether to request compressed output from LLM |
| compression_config | object | — | Per-role compression settings (see below) |
| temperature | number | — | LLM temperature parameter |
| max_tokens | number | — | Maximum tokens to generate |
| tags | Record | — | Tags for cost attribution and tag-level HLE ceilings |
| max_history_messages | number | — | Limit conversation history length |
compression_config options:`typescript
{
compress_system?: boolean; // default: false
compress_user?: boolean; // default: true
compress_assistant?: boolean; // default: false
compress_only_last_n_user?: number | null; // default: 1
}
`Response (
CompleteResponse):`typescript
{
decompressed_response: string; // Final decompressed LLM response
compression_stats: {
compression_enabled: boolean;
original_tokens: number;
compressed_tokens: number;
token_savings: number;
compression_ratio: number;
token_count_exact?: boolean;
token_count_source?: string;
token_accounting_note?: string;
processing_time_ms?: number;
};
llm_stats: {
provider?: string;
model?: string;
input_tokens: number;
output_tokens: number;
total_tokens: number;
finish_reason?: string | null;
};
routing_info?: {
selected_model: string; // Model chosen by system
selected_provider: string; // Provider chosen by system
selected_model_id: string;
model_hle: number; // HLE score of selected model
model_price_per_million: number;
requested_hle: number | null;
effective_hle: number | null; // Effective HLE after admin ceilings
hle_source: 'request' | 'tag' | 'global' | 'none';
hle_clamped: boolean; // true if admin ceiling lowered your desired_hle
};
warnings?: string[];
// Convenience aliases
text?: string; // Alias for decompressed_response
tokens_saved?: number; // Alias for compression_stats.token_savings
tokens_used?: number; // Alias for llm_stats.total_tokens
compression_ratio?: number; // Alias for compression_stats.compression_ratio
}
`#####
compress(prompt, model?, tags?): PromiseAlso supports a legacy call shape:
compress(prompt, model, algorithm, tags?) (only "greedy" is supported).Compression-only (POST
/api/v1/compress).Parameters:
-
prompt (string, required): Text to compress
- model (string, optional): Model for tokenization. Default: 'gpt-4'
- algorithm ("greedy", optional): Legacy-only parameter. Only "greedy" is supported.
- tags (Record, optional): Tags for attributionResponse (
CompressResponse):`typescript
{
compressed: string;
dictionary: Record;
llm_format: string;
compression_ratio: number;
original_size: number;
compressed_size: number;
processing_time_ms: number;
algorithm: string;
}
`#####
decompress(llmFormat): PromiseDecompress an LLM-formatted compressed prompt (POST
/api/v1/decompress).Parameters:
-
llmFormat (string, required): The llm_format string from a compress responseResponse (
DecompressResponse):`typescript
{
decompressed: string;
processing_time_ms: number;
}
`#####
healthCheck(): PromiseCheck API health status (GET
/health).Response:
`typescript
{
status: string;
version?: string;
}
`
$3
`typescript
type MessageRole = 'system' | 'developer' | 'user' | 'assistant';interface Message {
role: MessageRole;
content: string;
}
`$3
| Variable | Description |
|----------|-------------|
|
PCOMPRESLR_API_KEY | Your LightReach API key (primary) |
| LIGHTREACH_API_KEY | Your LightReach API key (alternative) |
| PCOMPRESLR_API_URL | Override the API base URL (advanced/testing) |$3
| Exception | Description |
|-----------|-------------|
|
PcompresslrAPIError | Base exception class |
| APIKeyError | Invalid or missing API key |
| RateLimitError | Rate limit exceeded |
| APIRequestError | General API errors (including routing failures) |`typescript
import { APIKeyError, RateLimitError, APIRequestError } from 'compress-lightreach';try {
const result = await client.complete({ messages: [...] });
} catch (error) {
if (error instanceof APIKeyError) {
console.error('Invalid API key');
} else if (error instanceof RateLimitError) {
console.error('Rate limited, please retry later');
} else if (error instanceof APIRequestError) {
console.error('API error:', error.message);
}
}
`How It Works
Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.
The library:
1. Identifies repeated substrings using efficient suffix array algorithms
2. Calculates token savings for each potential replacement
3. Selects optimal replacements that reduce total token count
4. Intelligently routes to the best model based on your quality requirements
5. Formats the result for easy LLM consumption
6. Provides perfect decompression
Examples
$3
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';const client = new PcompresslrAPIClient("your-lightreach-api-key");
const prompt =
;const result = await client.complete({
messages: [{ role: "user", content: prompt }],
desired_hle: 30,
});
console.log(result.decompressed_response);
console.log(
Model used: ${result.routing_info?.selected_model});
console.log(Token savings: ${result.compression_stats.token_savings} tokens);
console.log(Compression ratio: ${(result.compression_stats.compression_ratio * 100).toFixed(2)}%);
`$3
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [{ role: "user", content: "Generate a long report with repeated sections..." }],
desired_hle: 35,
compress_output: true,
});
console.log(result.decompressed_response);
`$3
`typescript
import { PcompresslrAPIClient } from 'compress-lightreach';const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "How do I read a file in Python?" },
{ role: "assistant", content: "You can use open() with a context manager..." },
{ role: "user", content: "How about writing to a file?" },
],
desired_hle: 30,
compression_config: {
compress_system: false,
compress_user: true,
compress_assistant: false,
compress_only_last_n_user: 2, // Only compress last 2 user messages
},
});
`Getting an API Key
To use Compress Light Reach, you need an API key from compress.lightreach.io.
1. Visit compress.lightreach.io
2. Sign up for an account
3. Get your API key from the dashboard
4. Set it as an environment variable:
export PCOMPRESLR_API_KEY=your-key`BYOK model: Provider keys (OpenAI/Anthropic/Google/etc.) are managed in the dashboard and never passed through this SDK. The SDK only uses your LightReach API key for authentication with the service.
- Node.js 14.0.0 or higher
- TypeScript 5.3.0+ (for TypeScript projects)
MIT License - see LICENSE file for details.
- Documentation: compress.lightreach.io/docs
- Issues: GitHub Issues
- Email: jonathankt@lightreach.io
Contributions are welcome! Please feel free to submit a Pull Request.