Unified LLM API with automatic model discovery and provider configuration
npm install @mariozechner/pi-aiUnified LLM API with automatic model discovery, provider configuration, token and cost tracking, and simple context persistence and hand-off to other models mid-session.
Note: This library only includes models that support tool calling (function calling), as this is essential for agentic workflows.
- Supported Providers
- Installation
- Quick Start
- Tools
- Defining Tools
- Handling Tool Calls
- Streaming Tool Calls with Partial JSON
- Validating Tool Arguments
- Complete Event Reference
- Image Input
- Thinking/Reasoning
- Unified Interface
- Provider-Specific Options
- Streaming Thinking Content
- Stop Reasons
- Error Handling
- Aborting Requests
- Continuing After Abort
- APIs, Models, and Providers
- Providers and Models
- Querying Providers and Models
- Custom Models
- OpenAI Compatibility Settings
- Type Safety
- Cross-Provider Handoffs
- Context Serialization
- Browser Usage
- Environment Variables
- Checking Environment Variables
- OAuth Providers
- Vertex AI (ADC)
- CLI Login
- Programmatic OAuth
- Login Flow Example
- Using OAuth Tokens
- Provider Notes
- License
- OpenAI
- Azure OpenAI (Responses)
- OpenAI Codex (ChatGPT Plus/Pro subscription, requires OAuth, see below)
- Anthropic
- Google
- Vertex AI (Gemini via Vertex AI)
- Mistral
- Groq
- Cerebras
- xAI
- OpenRouter
- Vercel AI Gateway
- MiniMax
- GitHub Copilot (requires OAuth, see below)
- Google Gemini CLI (requires OAuth, see below)
- Antigravity (requires OAuth, see below)
- Amazon Bedrock
- Kimi For Coding (Moonshot AI, uses Anthropic-compatible API)
- Any OpenAI-compatible API: Ollama, vLLM, LM Studio, etc.
``bash`
npm install @mariozechner/pi-ai
TypeBox exports are re-exported from @mariozechner/pi-ai: Type, Static, and TSchema.
`typescript
import { Type, getModel, stream, complete, Context, Tool, StringEnum } from '@mariozechner/pi-ai';
// Fully typed with auto-complete support for both providers and models
const model = getModel('openai', 'gpt-4o-mini');
// Define tools with TypeBox schemas for type safety and validation
const tools: Tool[] = [{
name: 'get_time',
description: 'Get the current time',
parameters: Type.Object({
timezone: Type.Optional(Type.String({ description: 'Optional timezone (e.g., America/New_York)' }))
})
}];
// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
systemPrompt: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'What time is it?' }],
tools
};
// Option 1: Streaming with all event types
const s = stream(model, context);
for await (const event of s) {
switch (event.type) {
case 'start':
console.log(Starting with ${event.partial.model});\n[Tool call started: index ${event.contentIndex}]
break;
case 'text_start':
console.log('\n[Text started]');
break;
case 'text_delta':
process.stdout.write(event.delta);
break;
case 'text_end':
console.log('\n[Text ended]');
break;
case 'thinking_start':
console.log('[Model is thinking...]');
break;
case 'thinking_delta':
process.stdout.write(event.delta);
break;
case 'thinking_end':
console.log('[Thinking complete]');
break;
case 'toolcall_start':
console.log();[Streaming args for ${partialCall.name}]
break;
case 'toolcall_delta':
// Partial tool arguments are being streamed
const partialCall = event.partial.content[event.contentIndex];
if (partialCall.type === 'toolCall') {
console.log();\nTool called: ${event.toolCall.name}
}
break;
case 'toolcall_end':
console.log();Arguments: ${JSON.stringify(event.toolCall.arguments)}
console.log();\nFinished: ${event.reason}
break;
case 'done':
console.log();Error: ${event.error}
break;
case 'error':
console.error();
break;
}
}
// Get the final message after streaming, add it to the context
const finalMessage = await s.result();
context.messages.push(finalMessage);
// Handle tool calls if any
const toolCalls = finalMessage.content.filter(b => b.type === 'toolCall');
for (const call of toolCalls) {
// Execute the tool
const result = call.name === 'get_time'
? new Date().toLocaleString('en-US', {
timeZone: call.arguments.timezone || 'UTC',
dateStyle: 'full',
timeStyle: 'long'
})
: 'Unknown tool';
// Add tool result to context (supports text and images)
context.messages.push({
role: 'toolResult',
toolCallId: call.id,
toolName: call.name,
content: [{ type: 'text', text: result }],
isError: false,
timestamp: Date.now()
});
}
// Continue if there were tool calls
if (toolCalls.length > 0) {
const continuation = await complete(model, context);
context.messages.push(continuation);
console.log('After tool execution:', continuation.content);
}
console.log(Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out);Cost: $${finalMessage.usage.cost.total.toFixed(4)}
console.log();
// Option 2: Get complete response without streaming
const response = await complete(model, context);
for (const block of response.content) {
if (block.type === 'text') {
console.log(block.text);
} else if (block.type === 'toolCall') {
console.log(Tool: ${block.name}(${JSON.stringify(block.arguments)}));`
}
}
Tools enable LLMs to interact with external systems. This library uses TypeBox schemas for type-safe tool definitions with automatic validation using AJV. TypeBox schemas can be serialized and deserialized as plain JSON, making them ideal for distributed systems.
`typescript
import { Type, Tool, StringEnum } from '@mariozechner/pi-ai';
// Define tool parameters with TypeBox
const weatherTool: Tool = {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: Type.Object({
location: Type.String({ description: 'City name or coordinates' }),
units: StringEnum(['celsius', 'fahrenheit'], { default: 'celsius' })
})
};
// Note: For Google API compatibility, use StringEnum helper instead of Type.Enum
// Type.Enum generates anyOf/const patterns that Google doesn't support
const bookMeetingTool: Tool = {
name: 'book_meeting',
description: 'Schedule a meeting',
parameters: Type.Object({
title: Type.String({ minLength: 1 }),
startTime: Type.String({ format: 'date-time' }),
endTime: Type.String({ format: 'date-time' }),
attendees: Type.Array(Type.String({ format: 'email' }), { minItems: 1 })
})
};
`
Tool results use content blocks and can include both text and images:
`typescript
import { readFileSync } from 'fs';
const context: Context = {
messages: [{ role: 'user', content: 'What is the weather in London?' }],
tools: [weatherTool]
};
const response = await complete(model, context);
// Check for tool calls in the response
for (const block of response.content) {
if (block.type === 'toolCall') {
// Execute your tool with the arguments
// See "Validating Tool Arguments" section for validation
const result = await executeWeatherApi(block.arguments);
// Add tool result with text content
context.messages.push({
role: 'toolResult',
toolCallId: block.id,
toolName: block.name,
content: [{ type: 'text', text: JSON.stringify(result) }],
isError: false,
timestamp: Date.now()
});
}
}
// Tool results can also include images (for vision-capable models)
const imageBuffer = readFileSync('chart.png');
context.messages.push({
role: 'toolResult',
toolCallId: 'tool_xyz',
toolName: 'generate_chart',
content: [
{ type: 'text', text: 'Generated chart showing temperature trends' },
{ type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
],
isError: false,
timestamp: Date.now()
});
`
During streaming, tool call arguments are progressively parsed as they arrive. This enables real-time UI updates before the complete arguments are available:
`typescript
const s = stream(model, context);
for await (const event of s) {
if (event.type === 'toolcall_delta') {
const toolCall = event.partial.content[event.contentIndex];
// toolCall.arguments contains partially parsed JSON during streaming
// This allows for progressive UI updates
if (toolCall.type === 'toolCall' && toolCall.arguments) {
// BE DEFENSIVE: arguments may be incomplete
// Example: Show file path being written even before content is complete
if (toolCall.name === 'write_file' && toolCall.arguments.path) {
console.log(Writing to: ${toolCall.arguments.path});
// Content might be partial or missing
if (toolCall.arguments.content) {
console.log(Content preview: ${toolCall.arguments.content.substring(0, 100)}...);
}
}
}
}
if (event.type === 'toolcall_end') {
// Here toolCall.arguments is complete (but not yet validated)
const toolCall = event.toolCall;
console.log(Tool completed: ${toolCall.name}, toolCall.arguments);`
}
}
Important notes about partial tool arguments:
- During toolcall_delta events, arguments contains the best-effort parse of partial JSONarguments
- Fields may be missing or incomplete - always check for existence before use
- String values may be truncated mid-word
- Arrays may be incomplete
- Nested objects may be partially populated
- At minimum, will be an empty object {}, never undefinedtoolcall_delta
- The Google provider does not support function call streaming. Instead, you will receive a single event with the full arguments.
When using agentLoop, tool arguments are automatically validated against your TypeBox schemas before execution. If validation fails, the error is returned to the model as a tool result, allowing it to retry.
When implementing your own tool execution loop with stream() or complete(), use validateToolCall to validate arguments before passing them to your tools:
`typescript
import { stream, validateToolCall, Tool } from '@mariozechner/pi-ai';
const tools: Tool[] = [weatherTool, calculatorTool];
const s = stream(model, { messages, tools });
for await (const event of s) {
if (event.type === 'toolcall_end') {
const toolCall = event.toolCall;
try {
// Validate arguments against the tool's schema (throws on invalid args)
const validatedArgs = validateToolCall(tools, toolCall);
const result = await executeMyTool(toolCall.name, validatedArgs);
// ... add tool result to context
} catch (error) {
// Validation failed - return error as tool result so model can retry
context.messages.push({
role: 'toolResult',
toolCallId: toolCall.id,
toolName: toolCall.name,
content: [{ type: 'text', text: error.message }],
isError: true,
timestamp: Date.now()
});
}
}
}
`
All streaming events emitted during assistant message generation:
| Event Type | Description | Key Properties |
|------------|-------------|----------------|
| start | Stream begins | partial: Initial assistant message structure |text_start
| | Text block starts | contentIndex: Position in content array |text_delta
| | Text chunk received | delta: New text, contentIndex: Position |text_end
| | Text block complete | content: Full text, contentIndex: Position |thinking_start
| | Thinking block starts | contentIndex: Position in content array |thinking_delta
| | Thinking chunk received | delta: New text, contentIndex: Position |thinking_end
| | Thinking block complete | content: Full thinking, contentIndex: Position |toolcall_start
| | Tool call begins | contentIndex: Position in content array |toolcall_delta
| | Tool arguments streaming | delta: JSON chunk, partial.content[contentIndex].arguments: Partial parsed args |toolcall_end
| | Tool call complete | toolCall: Complete validated tool call with id, name, arguments |done
| | Stream complete | reason: Stop reason ("stop", "length", "toolUse"), message: Final assistant message |error
| | Error occurred | reason: Error type ("error" or "aborted"), error: AssistantMessage with partial content |
Models with vision capabilities can process images. You can check if a model supports images via the input property. If you pass images to a non-vision model, they are silently ignored.
`typescript
import { readFileSync } from 'fs';
import { getModel, complete } from '@mariozechner/pi-ai';
const model = getModel('openai', 'gpt-4o-mini');
// Check if model supports images
if (model.input.includes('image')) {
console.log('Model supports vision');
}
const imageBuffer = readFileSync('image.png');
const base64Image = imageBuffer.toString('base64');
const response = await complete(model, {
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image', data: base64Image, mimeType: 'image/png' }
]
}]
});
// Access the response
for (const block of response.content) {
if (block.type === 'text') {
console.log(block.text);
}
}
`
Many models support thinking/reasoning capabilities where they can show their internal thought process. You can check if a model supports reasoning via the reasoning property. If you pass reasoning options to a non-reasoning model, they are silently ignored.
`typescript
import { getModel, streamSimple, completeSimple } from '@mariozechner/pi-ai';
// Many models across providers support thinking/reasoning
const model = getModel('anthropic', 'claude-sonnet-4-20250514');
// or getModel('openai', 'gpt-5-mini');
// or getModel('google', 'gemini-2.5-flash');
// or getModel('xai', 'grok-code-fast-1');
// or getModel('groq', 'openai/gpt-oss-20b');
// or getModel('cerebras', 'gpt-oss-120b');
// or getModel('openrouter', 'z-ai/glm-4.5v');
// Check if model supports reasoning
if (model.reasoning) {
console.log('Model supports reasoning/thinking');
}
// Use the simplified reasoning option
const response = await completeSimple(model, {
messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13' }]
}, {
reasoning: 'medium' // 'minimal' | 'low' | 'medium' | 'high' | 'xhigh' (xhigh maps to high on non-OpenAI providers)
});
// Access thinking and text blocks
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Response:', block.text);
}
}
`
For fine-grained control, use the provider-specific options:
`typescript
import { getModel, complete } from '@mariozechner/pi-ai';
// OpenAI Reasoning (o1, o3, gpt-5)
const openaiModel = getModel('openai', 'gpt-5-mini');
await complete(openaiModel, context, {
reasoningEffort: 'medium',
reasoningSummary: 'detailed' // OpenAI Responses API only
});
// Anthropic Thinking (Claude Sonnet 4)
const anthropicModel = getModel('anthropic', 'claude-sonnet-4-20250514');
await complete(anthropicModel, context, {
thinkingEnabled: true,
thinkingBudgetTokens: 8192 // Optional token limit
});
// Google Gemini Thinking
const googleModel = getModel('google', 'gemini-2.5-flash');
await complete(googleModel, context, {
thinking: {
enabled: true,
budgetTokens: 8192 // -1 for dynamic, 0 to disable
}
});
`
When streaming, thinking content is delivered through specific events:
`typescript
const s = streamSimple(model, context, { reasoning: 'high' });
for await (const event of s) {
switch (event.type) {
case 'thinking_start':
console.log('[Model started thinking]');
break;
case 'thinking_delta':
process.stdout.write(event.delta); // Stream thinking content
break;
case 'thinking_end':
console.log('\n[Thinking complete]');
break;
}
}
`
Every AssistantMessage includes a stopReason field that indicates how the generation ended:
- "stop" - Normal completion, the model finished its response"length"
- - Output hit the maximum token limit"toolUse"
- - Model is calling tools and expects tool results"error"
- - An error occurred during generation"aborted"
- - Request was cancelled via abort signal
When a request ends with an error (including aborts and tool call validation errors), the streaming API emits an error event:
`typescriptError (${event.reason}):
// In streaming
for await (const event of stream) {
if (event.type === 'error') {
// event.reason is either "error" or "aborted"
// event.error is the AssistantMessage with partial content
console.error(, event.error.errorMessage);
console.log('Partial content:', event.error.content);
}
}
// The final message will have the error details
const message = await stream.result();
if (message.stopReason === 'error' || message.stopReason === 'aborted') {
console.error('Request failed:', message.errorMessage);
// message.content contains any partial content received before the error
// message.usage contains partial token counts and costs
}
`
The abort signal allows you to cancel in-progress requests. Aborted requests have stopReason === 'aborted':
`typescript
import { getModel, stream } from '@mariozechner/pi-ai';
const model = getModel('openai', 'gpt-4o-mini');
const controller = new AbortController();
// Abort after 2 seconds
setTimeout(() => controller.abort(), 2000);
const s = stream(model, {
messages: [{ role: 'user', content: 'Write a long story' }]
}, {
signal: controller.signal
});
for await (const event of s) {
if (event.type === 'text_delta') {
process.stdout.write(event.delta);
} else if (event.type === 'error') {
// event.reason tells you if it was "error" or "aborted"
console.log(${event.reason === 'aborted' ? 'Aborted' : 'Error'}:, event.error.errorMessage);
}
}
// Get results (may be partial if aborted)
const response = await s.result();
if (response.stopReason === 'aborted') {
console.log('Request was aborted:', response.errorMessage);
console.log('Partial content received:', response.content);
console.log('Tokens used:', response.usage);
}
`
Aborted messages can be added to the conversation context and continued in subsequent requests:
`typescript
const context = {
messages: [
{ role: 'user', content: 'Explain quantum computing in detail' }
]
};
// First request gets aborted after 2 seconds
const controller1 = new AbortController();
setTimeout(() => controller1.abort(), 2000);
const partial = await complete(model, context, { signal: controller1.signal });
// Add the partial response to context
context.messages.push(partial);
context.messages.push({ role: 'user', content: 'Please continue' });
// Continue the conversation
const continuation = await complete(model, context);
`
Use the onPayload callback to inspect the request payload sent to the provider. This is useful for debugging request formatting issues or provider validation errors.
`typescript`
const response = await complete(model, context, {
onPayload: (payload) => {
console.log('Provider payload:', JSON.stringify(payload, null, 2));
}
});
The callback is supported by stream, complete, streamSimple, and completeSimple.
The library uses a registry of API implementations. Built-in APIs include:
- anthropic-messages: Anthropic Messages API (streamAnthropic, AnthropicOptions)google-generative-ai
- : Google Generative AI API (streamGoogle, GoogleOptions)google-gemini-cli
- : Google Cloud Code Assist API (streamGoogleGeminiCli, GoogleGeminiCliOptions)google-vertex
- : Google Vertex AI API (streamGoogleVertex, GoogleVertexOptions)openai-completions
- : OpenAI Chat Completions API (streamOpenAICompletions, OpenAICompletionsOptions)openai-responses
- : OpenAI Responses API (streamOpenAIResponses, OpenAIResponsesOptions)openai-codex-responses
- : OpenAI Codex Responses API (streamOpenAICodexResponses, OpenAICodexResponsesOptions)azure-openai-responses
- : Azure OpenAI Responses API (streamAzureOpenAIResponses, AzureOpenAIResponsesOptions)bedrock-converse-stream
- : Amazon Bedrock Converse API (streamBedrock, BedrockOptions)
A provider offers models through a specific API. For example:
- Anthropic models use the anthropic-messages APIgoogle-generative-ai
- Google models use the APIopenai-responses
- OpenAI models use the APIopenai-completions
- Mistral, xAI, Cerebras, Groq, etc. models use the API (OpenAI-compatible)
`typescript
import { getProviders, getModels, getModel } from '@mariozechner/pi-ai';
// Get all available providers
const providers = getProviders();
console.log(providers); // ['openai', 'anthropic', 'google', 'xai', 'groq', ...]
// Get all models from a provider (fully typed)
const anthropicModels = getModels('anthropic');
for (const model of anthropicModels) {
console.log(${model.id}: ${model.name}); API: ${model.api}
console.log(); // 'anthropic-messages' Context: ${model.contextWindow} tokens
console.log(); Vision: ${model.input.includes('image')}
console.log(); Reasoning: ${model.reasoning}
console.log();
}
// Get a specific model (both provider and model ID are auto-completed in IDEs)
const model = getModel('openai', 'gpt-4o-mini');
console.log(Using ${model.name} via ${model.api} API);`
You can create custom models for local inference servers or custom endpoints:
`typescript
import { Model, stream } from '@mariozechner/pi-ai';
// Example: Ollama using OpenAI-compatible API
const ollamaModel: Model<'openai-completions'> = {
id: 'llama-3.1-8b',
name: 'Llama 3.1 8B (Ollama)',
api: 'openai-completions',
provider: 'ollama',
baseUrl: 'http://localhost:11434/v1',
reasoning: false,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 32000
};
// Example: LiteLLM proxy with explicit compat settings
const litellmModel: Model<'openai-completions'> = {
id: 'gpt-4o',
name: 'GPT-4o (via LiteLLM)',
api: 'openai-completions',
provider: 'litellm',
baseUrl: 'http://localhost:4000/v1',
reasoning: false,
input: ['text', 'image'],
cost: { input: 2.5, output: 10, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 16384,
compat: {
supportsStore: false, // LiteLLM doesn't support the store field
}
};
// Example: Custom endpoint with headers (bypassing Cloudflare bot detection)
const proxyModel: Model<'anthropic-messages'> = {
id: 'claude-sonnet-4',
name: 'Claude Sonnet 4 (Proxied)',
api: 'anthropic-messages',
provider: 'custom-proxy',
baseUrl: 'https://proxy.example.com/v1',
reasoning: true,
input: ['text', 'image'],
cost: { input: 3, output: 15, cacheRead: 0.3, cacheWrite: 3.75 },
contextWindow: 200000,
maxTokens: 8192,
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'X-Custom-Auth': 'bearer-token-here'
}
};
// Use the custom model
const response = await stream(ollamaModel, context, {
apiKey: 'dummy' // Ollama doesn't need a real key
});
`
The openai-completions API is implemented by many providers with minor differences. By default, the library auto-detects compatibility settings based on baseUrl for known providers (Cerebras, xAI, Mistral, Chutes, etc.). For custom proxies or unknown endpoints, you can override these settings via the compat field. For openai-responses models, the compat field only supports Responses-specific flags.
`typescriptstore
interface OpenAICompletionsCompat {
supportsStore?: boolean; // Whether provider supports the field (default: true)developer
supportsDeveloperRole?: boolean; // Whether provider supports role vs system (default: true)reasoning_effort
supportsReasoningEffort?: boolean; // Whether provider supports (default: true)stream_options: { include_usage: true }
supportsUsageInStreaming?: boolean; // Whether provider supports (default: true)strict
supportsStrictMode?: boolean; // Whether provider supports in tool definitions (default: true)name
maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
requiresToolResultName?: boolean; // Whether tool results require the field (default: false)
requiresAssistantAfterToolResult?: boolean; // Whether tool results must be followed by an assistant message (default: false)
requiresThinkingAsText?: boolean; // Whether thinking blocks must be converted to text (default: false)
requiresMistralToolIds?: boolean; // Whether tool call IDs must be normalized to Mistral format (default: false)
thinkingFormat?: 'openai' | 'zai' | 'qwen'; // Format for reasoning param: 'openai' uses reasoning_effort, 'zai' uses thinking: { type: "enabled" }, 'qwen' uses enable_thinking: boolean (default: openai)
openRouterRouting?: OpenRouterRouting; // OpenRouter routing preferences (default: {})
vercelGatewayRouting?: VercelGatewayRouting; // Vercel AI Gateway routing preferences (default: {})
}
interface OpenAIResponsesCompat {
// Reserved for future use
}
`
If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. This is useful for:
- LiteLLM proxies: May not support store field
- Custom inference servers: May use non-standard field names
- Self-hosted endpoints: May have different feature support
Models are typed by their API, which keeps the model metadata accurate. Provider-specific option types are enforced when you call the provider functions directly. The generic stream and complete functions accept StreamOptions with additional provider fields.
`typescript
import { streamAnthropic, type AnthropicOptions } from '@mariozechner/pi-ai';
// TypeScript knows this is an Anthropic model
const claude = getModel('anthropic', 'claude-sonnet-4-20250514');
const options: AnthropicOptions = {
thinkingEnabled: true,
thinkingBudgetTokens: 2048
};
await streamAnthropic(claude, context, options);
`
The library supports seamless handoffs between different LLM providers within the same conversation. This allows you to switch models mid-conversation while preserving context, including thinking blocks, tool calls, and tool results.
When messages from one provider are sent to a different provider, the library automatically transforms them for compatibility:
- User and tool result messages are passed through unchanged
- Assistant messages from the same provider/API are preserved as-is
- Assistant messages from different providers have their thinking blocks converted to text with tags
- Tool calls and regular text are preserved unchanged
`typescript
import { getModel, complete, Context } from '@mariozechner/pi-ai';
// Start with Claude
const claude = getModel('anthropic', 'claude-sonnet-4-20250514');
const context: Context = {
messages: []
};
context.messages.push({ role: 'user', content: 'What is 25 * 18?' });
const claudeResponse = await complete(claude, context, {
thinkingEnabled: true
});
context.messages.push(claudeResponse);
// Switch to GPT-5 - it will see Claude's thinking as
const gpt5 = getModel('openai', 'gpt-5-mini');
context.messages.push({ role: 'user', content: 'Is that calculation correct?' });
const gptResponse = await complete(gpt5, context);
context.messages.push(gptResponse);
// Switch to Gemini
const gemini = getModel('google', 'gemini-2.5-flash');
context.messages.push({ role: 'user', content: 'What was the original question?' });
const geminiResponse = await complete(gemini, context);
`
All providers can handle messages from other providers, including:
- Text content
- Tool calls and tool results (including images in tool results)
- Thinking/reasoning blocks (transformed to tagged text for cross-provider compatibility)
- Aborted messages with partial content
This enables flexible workflows where you can:
- Start with a fast model for initial responses
- Switch to a more capable model for complex reasoning
- Use specialized models for specific tasks
- Maintain conversation continuity across provider outages
The Context object can be easily serialized and deserialized using standard JSON methods, making it simple to persist conversations, implement chat history, or transfer contexts between services:
`typescript
import { Context, getModel, complete } from '@mariozechner/pi-ai';
// Create and use a context
const context: Context = {
systemPrompt: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'What is TypeScript?' }
]
};
const model = getModel('openai', 'gpt-4o-mini');
const response = await complete(model, context);
context.messages.push(response);
// Serialize the entire context
const serialized = JSON.stringify(context);
console.log('Serialized context size:', serialized.length, 'bytes');
// Save to database, localStorage, file, etc.
localStorage.setItem('conversation', serialized);
// Later: deserialize and continue the conversation
const restored: Context = JSON.parse(localStorage.getItem('conversation')!);
restored.messages.push({ role: 'user', content: 'Tell me more about its type system' });
// Continue with any model
const newModel = getModel('anthropic', 'claude-3-5-haiku-20241022');
const continuation = await complete(newModel, restored);
`
> Note: If the context contains images (encoded as base64 as shown in the Image Input section), those will also be serialized.
The library supports browser environments. You must pass the API key explicitly since environment variables are not available in browsers:
`typescript
import { getModel, complete } from '@mariozechner/pi-ai';
// API key must be passed explicitly in browser
const model = getModel('anthropic', 'claude-3-5-haiku-20241022');
const response = await complete(model, {
messages: [{ role: 'user', content: 'Hello!' }]
}, {
apiKey: 'your-api-key'
});
`
> Security Warning: Exposing API keys in frontend code is dangerous. Anyone can extract and abuse your keys. Only use this approach for internal tools or demos. For production applications, use a backend proxy that keeps your API keys secure.
In Node.js environments, you can set environment variables to avoid passing API keys:
| Provider | Environment Variable(s) |
|----------|------------------------|
| OpenAI | OPENAI_API_KEY |AZURE_OPENAI_API_KEY
| Azure OpenAI | + AZURE_OPENAI_BASE_URL or AZURE_OPENAI_RESOURCE_NAME (optional AZURE_OPENAI_API_VERSION, AZURE_OPENAI_DEPLOYMENT_NAME_MAP like model=deployment,model2=deployment2) |ANTHROPIC_API_KEY
| Anthropic | or ANTHROPIC_OAUTH_TOKEN |GEMINI_API_KEY
| Google | |GOOGLE_CLOUD_PROJECT
| Vertex AI | (or GCLOUD_PROJECT) + GOOGLE_CLOUD_LOCATION + ADC |MISTRAL_API_KEY
| Mistral | |GROQ_API_KEY
| Groq | |CEREBRAS_API_KEY
| Cerebras | |XAI_API_KEY
| xAI | |OPENROUTER_API_KEY
| OpenRouter | |AI_GATEWAY_API_KEY
| Vercel AI Gateway | |ZAI_API_KEY
| zAI | |MINIMAX_API_KEY
| MiniMax | |KIMI_API_KEY
| Kimi For Coding | |COPILOT_GITHUB_TOKEN
| GitHub Copilot | or GH_TOKEN or GITHUB_TOKEN |
When set, the library automatically uses these keys:
`typescript
// Uses OPENAI_API_KEY from environment
const model = getModel('openai', 'gpt-4o-mini');
const response = await complete(model, context);
// Or override with explicit key
const response = await complete(model, context, {
apiKey: 'sk-different-key'
});
`
#### Antigravity Version Override
Set PI_AI_ANTIGRAVITY_VERSION to override the Antigravity User-Agent version when Google updates their requirements:
`bash`
export PI_AI_ANTIGRAVITY_VERSION="1.23.0"
#### Cache Retention
Set PI_CACHE_RETENTION=long to extend prompt cache retention:
| Provider | Default | With PI_CACHE_RETENTION=long |
|----------|---------|-------------------------------|
| Anthropic | 5 minutes | 1 hour |
| OpenAI | in-memory | 24 hours |
This only affects direct API calls to api.anthropic.com and api.openai.com. Proxies and other providers are unaffected.
> Note: Extended cache retention may increase costs for Anthropic (cache writes are charged at a higher rate). OpenAI's 24h retention has no additional cost.
`typescript
import { getEnvApiKey } from '@mariozechner/pi-ai';
// Check if an API key is set in environment variables
const key = getEnvApiKey('openai'); // checks OPENAI_API_KEY
`
Several providers require OAuth authentication instead of static API keys:
- Anthropic (Claude Pro/Max subscription)
- OpenAI Codex (ChatGPT Plus/Pro subscription, access to GPT-5.x Codex models)
- GitHub Copilot (Copilot subscription)
- Google Gemini CLI (Gemini 2.0/2.5 via Google Cloud Code Assist; free tier or paid subscription)
- Antigravity (Free Gemini 3, Claude, GPT-OSS via Google Cloud)
For paid Cloud Code Assist subscriptions, set GOOGLE_CLOUD_PROJECT or GOOGLE_CLOUD_PROJECT_ID to your project ID.
Vertex AI models use Application Default Credentials (ADC):
- Local development: Run gcloud auth application-default loginGOOGLE_APPLICATION_CREDENTIALS
- CI/Production: Set to point to a service account JSON key file
Also set GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) and GOOGLE_CLOUD_LOCATION. You can also pass project/location in the call options.
Example:
`bashLocal (uses your user credentials)
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="my-project"
export GOOGLE_CLOUD_LOCATION="us-central1"
`typescript
import { getModel, complete } from '@mariozechner/pi-ai';(async () => {
const model = getModel('google-vertex', 'gemini-2.5-flash');
const response = await complete(model, {
messages: [{ role: 'user', content: 'Hello from Vertex AI' }]
});
for (const block of response.content) {
if (block.type === 'text') console.log(block.text);
}
})().catch(console.error);
`Official docs: Application Default Credentials
$3
The quickest way to authenticate:
`bash
npx @mariozechner/pi-ai login # interactive provider selection
npx @mariozechner/pi-ai login anthropic # login to specific provider
npx @mariozechner/pi-ai list # list available providers
`Credentials are saved to
auth.json in the current directory.$3
The library provides login and token refresh functions. Credential storage is the caller's responsibility.
`typescript
import {
// Login functions (return credentials, do not store)
loginAnthropic,
loginOpenAICodex,
loginGitHubCopilot,
loginGeminiCli,
loginAntigravity, // Token management
refreshOAuthToken, // (provider, credentials) => new credentials
getOAuthApiKey, // (provider, credentialsMap) => { newCredentials, apiKey } | null
// Types
type OAuthProvider, // 'anthropic' | 'openai-codex' | 'github-copilot' | 'google-gemini-cli' | 'google-antigravity'
type OAuthCredentials,
} from '@mariozechner/pi-ai';
`$3
`typescript
import { loginGitHubCopilot } from '@mariozechner/pi-ai';
import { writeFileSync } from 'fs';const credentials = await loginGitHubCopilot({
onAuth: (url, instructions) => {
console.log(
Open: ${url});
if (instructions) console.log(instructions);
},
onPrompt: async (prompt) => {
return await getUserInput(prompt.message);
},
onProgress: (message) => console.log(message)
});// Store credentials yourself
const auth = { 'github-copilot': { type: 'oauth', ...credentials } };
writeFileSync('auth.json', JSON.stringify(auth, null, 2));
`$3
Use
getOAuthApiKey() to get an API key, automatically refreshing if expired:`typescript
import { getModel, complete, getOAuthApiKey } from '@mariozechner/pi-ai';
import { readFileSync, writeFileSync } from 'fs';// Load your stored credentials
const auth = JSON.parse(readFileSync('auth.json', 'utf-8'));
// Get API key (refreshes if expired)
const result = await getOAuthApiKey('github-copilot', auth);
if (!result) throw new Error('Not logged in');
// Save refreshed credentials
auth['github-copilot'] = { type: 'oauth', ...result.newCredentials };
writeFileSync('auth.json', JSON.stringify(auth, null, 2));
// Use the API key
const model = getModel('github-copilot', 'gpt-4o');
const response = await complete(model, {
messages: [{ role: 'user', content: 'Hello!' }]
}, { apiKey: result.apiKey });
`$3
OpenAI Codex: Requires a ChatGPT Plus or Pro subscription. Provides access to GPT-5.x Codex models with extended context windows and reasoning capabilities. The library automatically handles session-based prompt caching when
sessionId is provided in stream options.Azure OpenAI (Responses): Uses the Responses API only. Set
AZURE_OPENAI_API_KEY and either AZURE_OPENAI_BASE_URL or AZURE_OPENAI_RESOURCE_NAME. Use AZURE_OPENAI_API_VERSION (defaults to v1) to override the API version if needed. Deployment names are treated as model IDs by default, override with azureDeploymentName or AZURE_OPENAI_DEPLOYMENT_NAME_MAP using comma-separated model-id=deployment pairs (for example gpt-4o-mini=my-deployment,gpt-4o=prod). Legacy deployment-based URLs are intentionally unsupported.GitHub Copilot: If you get "The requested model is not supported" error, enable the model manually in VS Code: open Copilot Chat, click the model selector, select the model (warning icon), and click "Enable".
Google Gemini CLI / Antigravity: These use Google Cloud OAuth. The
apiKey returned by getOAuthApiKey() is a JSON string containing both the token and project ID, which the library handles automatically.Development
$3
Adding a new LLM provider requires changes across multiple files. This checklist covers all necessary steps:
#### 1. Core Types (
src/types.ts)- Add the API identifier to
KnownApi (for example "bedrock-converse-stream")
- Create an options interface extending StreamOptions (for example BedrockOptions)
- Add the provider name to KnownProvider (for example "amazon-bedrock")#### 2. Provider Implementation (
src/providers/)Create a new provider file (for example
amazon-bedrock.ts) that exports:-
stream function returning AssistantMessageEventStream
- streamSimple for SimpleStreamOptions mapping
- Provider-specific options interface
- Message conversion functions to transform Context to provider format
- Tool conversion if the provider supports tools
- Response parsing to emit standardized events (text, tool_call, thinking, usage, stop)#### 3. API Registry Integration (
src/providers/register-builtins.ts)- Register the API with
registerApiProvider()
- Add credential detection in env-api-keys.ts for the new provider
- Ensure streamSimple handles auth lookup via getEnvApiKey() or provider-specific auth#### 4. Model Generation (
scripts/generate-models.ts)- Add logic to fetch and parse models from the provider's source (e.g., models.dev API)
- Map provider model data to the standardized
Model interface
- Handle provider-specific quirks (pricing format, capability flags, model ID transformations)#### 5. Tests (
test/)Create or update test files to cover the new provider:
-
stream.test.ts - Basic streaming and tool use
- tokens.test.ts - Token usage reporting
- abort.test.ts - Request cancellation
- empty.test.ts - Empty message handling
- context-overflow.test.ts - Context limit errors
- image-limits.test.ts - Image support (if applicable)
- unicode-surrogate.test.ts - Unicode handling
- tool-call-without-result.test.ts - Orphaned tool calls
- image-tool-result.test.ts - Images in tool results
- total-tokens.test.ts - Token counting accuracy
- cross-provider-handoff.test.ts - Cross-provider context replayFor
cross-provider-handoff.test.ts, add at least one provider/model pair. If the provider exposes multiple model families (for example GPT and Claude), add at least one pair per family.For providers with non-standard auth (AWS, Google Vertex), create a utility like
bedrock-utils.ts with credential detection helpers.#### 6. Coding Agent Integration (
../coding-agent/)Update
src/core/model-resolver.ts:- Add a default model ID for the provider in
DEFAULT_MODELSUpdate
src/cli/args.ts:- Add environment variable documentation in the help text
Update
README.md:- Add the provider to the providers section with setup instructions
#### 7. Documentation
Update
packages/ai/README.md:- Add to the Supported Providers table
- Document any provider-specific options or authentication requirements
- Add environment variable to the Environment Variables section
#### 8. Changelog
Add an entry to
packages/ai/CHANGELOG.md under ## [Unreleased]:`markdown
$3
- Added support for [Provider Name] provider (#PR by @author)
``MIT