Sammy Agent Core: AI voice agent with screen capture, memory management, and live API integration
npm install @sammy-labs/sammy-threeCore AI voice agent functionality with screen capture, memory management, and live API integration. For an example implementaiton see this Github Repo.
``bash`
npm install @sammy-labs/sammy-three
For UI components, also install:
`bash`
npm install @sammy-labs/sammy-three-ui-kit
@sammy-labs/sammy-three is a comprehensive AI voice agent package that provides:
- Real-time voice conversation with Google's Gemini Live API
- Advanced audio processing with noise suppression and noise gate
- Screen capture (render-based or video-based) with worker optimization
- Memory management with semantic search and context injection
- Interactive guides for walkthrough experiences
- Tool system with built-in tools and MCP integration
- Observability with comprehensive event tracking and analytics
- Worker-based architecture for optimal performance
- Audio debugging with built-in stutter analyzer
1. Quick Start
2. Configuration Reference
3. Authentication
4. Screen Capture
5. Audio Processing
6. Memory Management
7. Interactive Guides
8. Custom Tools
9. MCP Integration
10. Observability
11. Error Handling
12. Performance Optimization
13. Advanced Features
14. Environment Variables
15. Troubleshooting
16. API Reference
`bash`
npm install @sammy-labs/sammy-threeor
pnpm add @sammy-labs/sammy-threeor
yarn add @sammy-labs/sammy-three
`tsx
import {
SammyAgentProvider,
useSammyAgentContext,
SammyApiClient,
createMemoryServices,
createCoreServices,
} from '@sammy-labs/sammy-three';
import '@sammy-labs/sammy-three/styles.css';
// 1. Create your authentication hook
const useAuth = () => {
return {
token: 'your-jwt-token',
baseUrl: 'https://your-api-url.com',
onTokenExpired: async () => {
// Handle token refresh
await refreshYourToken();
},
};
};
// 2. Wrap your app
function App() {
const auth = useAuth();
if (!auth.token) {
return
return (
auth: auth,
captureMethod: 'render', // or 'video'
debugLogs: true,
model: 'models/gemini-2.5-flash-preview-native-audio-dialog',
// Audio processing (NEW)
audioConfig: {
noiseSuppression: {
enabled: true,
enhancementLevel: 'medium',
},
noiseGate: {
enabled: true,
threshold: 0.04,
},
},
}}
onError={(error) => console.error('Agent error:', error)}
onTokenExpired={auth.onTokenExpired}
>
);
}
// 3. Use in components
function ChatComponent() {
const {
startAgent,
stopAgent,
sendMessage,
toggleMuted,
agentStatus,
activeSession,
agentVolume,
userVolume,
} = useSammyAgentContext();
const handleStart = async () => {
const success = await startAgent({
agentMode: 'user', // 'admin', 'user', or 'sammy'
sammyThreeOrganisationFeatureId: 'your-org-id', // optional
guideId: 'guide-123', // optional - for guided experiences
});
if (success) {
console.log('Agent started successfully');
}
};
return (
Configuration Reference
$3
`tsx
interface SammyAgentConfigProps {
// REQUIRED: Authentication configuration
auth: {
token: string; // Your JWT token
baseUrl?: string; // API base URL (defaults to production)
onTokenExpired?: () => void; // Token refresh callback
}; // Screen capture method
captureMethod?: 'render' | 'video'; // Default: 'render'
// AI model to use
model?: string; // Default: 'models/gemini-2.5-flash-preview-native-audio-dialog'
// Enable debug logging
debugLogs?: boolean; // Default: false
// Voice settings
defaultVoice?: string; // Default: 'aoede'
// Capture configuration
captureConfig?: {
frameRate?: number; // For video capture
quality?: number; // JPEG quality (0.0-1.0)
enableAudioAdaptation?: boolean; // Reduce capture during audio
};
// Audio processing configuration (NEW)
audioConfig?: {
noiseSuppression?: {
enabled?: boolean;
enhancementLevel?: 'light' | 'medium' | 'aggressive';
accessKey?: string; // For Koala noise suppression
modelPath?: string; // For Koala model
};
noiseGate?: {
enabled?: boolean;
threshold?: number; // 0-1, default 0.04
attackTime?: number; // ms, default 30
holdTime?: number; // ms, default 400
releaseTime?: number; // ms, default 150
};
environmentPreset?: 'office' | 'home' | 'noisy' | 'studio' | 'custom';
};
// Observability configuration
observability?: ObservabilityConfig;
// Audio performance debugging
debugAudioPerformance?: boolean;
// MCP (Model Context Protocol) configuration (NEW)
mcp?: MCPConfig;
// Guides configuration (NEW)
guides?: {
enabled?: boolean;
autoStartFromURL?: boolean;
queryParamName?: string; // Default: 'walkthrough'
};
}
`Canvas Ref Requirements
⚠️ IMPORTANT: Regardless of which capture method you use (
'render' or 'video'), you MUST include a canvas element in your component tree:`tsx
import { useRef } from 'react';
import { SammyAgentProvider } from '@sammy-labs/sammy-three';function App() {
// Required: Canvas ref for all capture methods
const canvasRef = useRef(null);
return (
config={{
captureMethod: 'render', // or 'video' - both need canvas
// ... other config
}}
>
{/ REQUIRED: Hidden canvas for processing /}
ref={canvasRef}
style={{ display: 'none' }}
/>
);
}
`$3
Both capture methods use the canvas element for:
- Frame processing: Scaling and optimizing captured frames
- Base64 encoding: Converting images for AI processing
- Performance optimization: Reusing canvas context for efficiency
- Fallback rendering: Backup processing when primary capture fails
Common Setup Error:
`tsx
// ❌ Wrong: Missing canvas
// ✅ Correct: Canvas included
`Audio Processing
The package includes advanced audio processing capabilities for cleaner voice input:
$3
Two noise suppression modes are available:
1. Browser Native (Default)
- Uses browser's built-in echo cancellation and noise suppression
- No additional configuration needed
2. Koala AI (Advanced)
- Requires Koala access key and model
- Superior noise removal for professional use
`tsx
const config = {
audioConfig: {
noiseSuppression: {
enabled: true,
enhancementLevel: 'medium', // 'light', 'medium', 'aggressive'
// For Koala (optional)
accessKey: process.env.KOALA_ACCESS_KEY,
modelPath: '/models/koala_model.pv',
},
},
};
`$3
Software-based noise gate to filter out background noise:
`tsx
const config = {
audioConfig: {
noiseGate: {
enabled: true,
threshold: 0.04, // Volume threshold (0-1)
attackTime: 30, // How fast gate opens (ms)
holdTime: 400, // Hold open during pauses (ms)
releaseTime: 150, // How fast gate closes (ms)
},
},
};
`$3
Pre-configured settings for different environments:
`tsx
const config = {
audioConfig: {
environmentPreset: 'home', // 'office', 'home', 'noisy', 'studio', 'custom'
},
};// Presets automatically configure:
// - office: Quiet office (low threshold, light suppression)
// - home: Home environment (balanced settings)
// - noisy: Cafe/public space (aggressive filtering)
// - studio: Professional recording (minimal processing)
`$3
Built-in analyzer for debugging audio issues:
`tsx
// Enable audio debugging
const config = {
debugAudioPerformance: true,
};// In browser console:
audioStutterAnalyzer.setDebugMode(true); // Enable collection
audioStutterAnalyzer.getAnalysis(); // View analysis
audioStutterAnalyzer.clear(); // Clear buffer
// The analyzer tracks:
// - Audio underruns (stuttering)
// - Long DOM captures during audio
// - Buffer statistics
// - Correlation between renders and stutters
`Interactive Guides
The package includes a complete guided walkthrough system for interactive tutorials:
$3
`tsx
import { useGuidesManager } from '@sammy-labs/sammy-three';function GuidedExperience() {
const { startAgent } = useSammyAgentContext();
// Initialize guides manager
const guides = useGuidesManager({
enabled: true,
authConfig: {
token: 'your-jwt-token',
baseUrl: 'https://your-api.com',
},
autoStartFromURL: true, // Auto-start from ?walkthrough=guide-id
onStartAgent: startAgent,
onWalkthroughStart: (guideId) => {
console.log('Starting walkthrough:', guideId);
},
});
if (!guides) return null;
return (
{/ Display available guides /}
{guides.userGuides.map(guide => (
key={guide.guideId}
onClick={() => guides.startWalkthrough(guide.guideId)}
>
{guide.title} {guide.isCompleted && '✓'}
))} {/ Current guide info /}
{guides.currentGuide && (
{guides.currentGuide.title}
{guides.currentGuide.description}
)}
);
}
`$3
Guides can be triggered via URL parameters:
`
https://yourapp.com?walkthrough=onboarding-guide-v1
`$3
`tsx
// Manual guide start
const success = await guides.startWalkthrough('guide-id');// Check for guide in URL
const hasGuide = await guides.checkForGuideInUrl();
// Mark as completed (automatic when conversation starts)
guides.markGuideAsCompleted('guide-id');
// Refresh guide data
guides.refreshGuide();
guides.refreshUserGuides();
// Clean up URL params after use
guides.cleanupUrlParams();
`Custom Tools
Extend agent capabilities with custom tools:
$3
`tsx
import { ToolDefinition, ToolCategories } from '@sammy-labs/sammy-three';const customTool: ToolDefinition = {
declaration: {
name: 'sendEmail',
description: 'Send an email to a user',
parameters: {
type: 'object',
properties: {
to: { type: 'string', description: 'Email recipient' },
subject: { type: 'string', description: 'Email subject' },
body: { type: 'string', description: 'Email body' },
},
required: ['to', 'subject', 'body'],
},
},
category: ToolCategories.ACTION,
handler: async (functionCall, context) => {
const { to, subject, body } = functionCall.args;
try {
// Your email sending logic
await sendEmail({ to, subject, body });
// Emit events for state management
context.emit('email:sent', { to, subject });
return {
success: true,
message:
Email sent to ${to},
};
} catch (error) {
return {
success: false,
error: error.message,
};
}
},
};// Use with provider
tools={[customTool]}
config={config}
>
{children}
`$3
The package includes several built-in tools:
1. End Session Tool - Gracefully end the conversation
2. Get Context Tool - Retrieve contextual information
3. Escalate Tool - Escalate to human support
`tsx
// Escalation is automatic when AI can't help
// The escalate tool is called by the AI when needed
// It will mark the conversation for human review
`MCP Integration
Support for Model Context Protocol (MCP) enables dynamic tool discovery:
`tsx
const config = {
mcp: {
enabled: true,
debug: true,
servers: [
{
name: 'hubspot',
type: 'sse',
sse: {
url: 'https://mcp-server.example.com/sse',
apiKey: process.env.MCP_API_KEY,
},
autoConnect: true,
toolPrefix: 'crm', // Tools will be prefixed: crm_create_contact
},
],
autoReconnect: true,
maxReconnectAttempts: 3,
},
};// MCP tools are automatically discovered and registered
// They appear alongside your custom tools
`Observability Configuration
Enable comprehensive event tracking and analytics:
`tsx
const config = {
observability: {
enabled: true,
logToConsole: true, // Log events to console for debugging
// Worker mode for performance
useWorker: true,
workerConfig: {
batchSize: 50,
batchIntervalMs: 5000, // 5 seconds
},
// Data inclusion controls
includeSystemPrompt: false, // Exclude sensitive prompts
includeAudioData: false, // Exclude raw audio (large)
includeImageData: true, // Include screen captures
// Event filtering (reduce noise)
disableEventTypes: [
'audio.send',
'audio.receive',
],
// Custom metadata
metadata: {
environment: 'production',
version: '1.0.0',
userId: 'user-123',
}, // Audio aggregation
audioAggregation: {
flushIntervalMs: 30000, // 30 seconds
onFlush: async (data) => {
// Handle flushed audio data
await uploadToS3(data);
},
},
// Custom event callback
callback: async (event) => {
// Send to your analytics service
await sendToAnalytics(event);
},
},
};
`Performance Optimization
$3
The system automatically captures fresh visual context at conversation boundaries:
`tsx
// Automatic critical renders happen at:
// - User starts/stops speaking
// - Agent starts/stops speaking
// - Conversation turns complete
// - User interrupts agent// This ensures the agent always has current visual context
// even during audio playback when captures are throttled
`$3
Screen capture automatically adapts during audio playback:
`tsx
const config = {
captureConfig: {
enableAudioAdaptation: true, // Default: true
quality: 0.8,
},
};// Normal mode: High-frequency captures (100ms-2s)
// Audio mode: Reduced captures (500ms-5s)
// Critical renders: Always execute at conversation boundaries
`$3
Multiple workers handle heavy processing:
`tsx
// Workers are automatic - no configuration needed
// - Canvas processing worker (image encoding)
// - Observability worker (API batching)
// - DOM capture workers (screenshot generation)// All workers use Data URLs for CSP compatibility
// Fallback to main thread if workers unavailable
`Advanced Features
$3
Automatic context tracking and updates:
`tsx
import { useContextUpdater } from '@sammy-labs/sammy-three';function MyComponent() {
// Automatically track page changes
useContextUpdater({
trackPageChanges: true,
updateInterval: 5000,
includeMetadata: true,
});
return
Content tracked automatically;
}// Manual context updates
import { ContextStateManager } from '@sammy-labs/sammy-three';
const contextManager = new ContextStateManager();
contextManager.updatePageMetadata({
url: window.location.href,
title: document.title,
});
contextManager.updateUserPreferences({
name: 'John Doe',
email: 'john@example.com',
});
`$3
Better permission handling:
`tsx
import { useMicrophonePermission } from '@sammy-labs/sammy-three';function MicCheck() {
const {
permission,
isChecking,
checkPermission,
requestPermission
} = useMicrophonePermission();
if (permission === 'denied') {
return
Microphone access denied. Please enable in settings.;
} if (permission === 'prompt') {
return (
);
}
return
Microphone ready!;
}
`$3
For advanced use cases:
`tsx
import {
SammyAgentCore,
SammyApiClient,
createMemoryServices,
ObservabilityManager,
} from '@sammy-labs/sammy-three';// Create services
const apiClient = new SammyApiClient({
token: 'your-jwt-token',
baseUrl: 'https://your-api.com',
});
const memoryService = createMemoryServices(apiClient);
// Create agent core directly
const agentCore = new SammyAgentCore({
config: {
auth: { token: 'your-jwt-token' },
captureMethod: 'render',
debugLogs: true,
},
tools: [/ your tools /],
callbacks: {
onError: (error) => console.error(error),
onTurnComplete: (summary) => console.log(summary),
},
services: {
memoryService,
},
});
// Start agent
await agentCore.start({
agentMode: 'user',
sammyThreeOrganisationFeatureId: 'org-123',
});
// Access observability
const trace = agentCore.getObservabilityTrace();
const stats = agentCore.getObservabilityStatistics();
`Environment Variables
Common environment variables for configuration:
`bash
API Configuration
NEXT_PUBLIC_SAMMY_API_BASE_URL=https://your-api.com
NEXT_PUBLIC_APP_VERSION=1.0.0Feature Flags
NEXT_PUBLIC_DISABLE_WORKER_MODE=falseAudio Processing (optional)
NEXT_PUBLIC_KOALA_ACCESS_KEY=your-key
NEXT_PUBLIC_KOALA_MODEL_PATH=/models/koala.pvMCP Integration (optional)
NEXT_PUBLIC_MCP_API_KEY=your-mcp-keyDebug Settings (development only)
NEXT_PUBLIC_DEBUG_AUDIO=true
NEXT_PUBLIC_DEBUG_OBSERVABILITY=true
`Error Handling
$3
`tsx
config={config}
onError={(error) => {
console.error('Agent error:', error);
// Handle different error types
if (error.message.includes('token')) {
// Handle authentication errors
redirectToLogin();
} else if (error.message.includes('microphone')) {
// Handle microphone permission errors
showMicrophonePermissionDialog();
} else {
// Handle other errors
showErrorToast(error.message);
}
}}
onTokenExpired={() => {
// Handle token expiration
refreshAuthToken();
}}
>
{children}
`$3
`tsx
function ChatComponent() {
const { error, agentStatus, startAgent } = useSammyAgentContext(); const handleStart = async () => {
try {
const success = await startAgent({
agentMode: 'user',
});
if (!success) {
console.error('Failed to start agent');
}
} catch (error) {
console.error('Start agent error:', error);
}
};
if (error) {
return (
Agent Error: {error.message}
);
} return (
Status: {agentStatus}
);
}
`Troubleshooting
$3
1. "Token expired" errors
`tsx
// Ensure onTokenExpired is implemented
const config = {
auth: {
token: jwtToken,
onTokenExpired: async () => {
await refreshToken();
},
},
};
`2. Microphone permission denied
`tsx
// Use the microphone permission hook
const { permission, requestPermission } = useMicrophonePermission();
if (permission === 'denied') {
// Show instructions to enable in browser settings
}
`3. Audio stuttering
`tsx
// Enable audio debugging
audioStutterAnalyzer.setDebugMode(true);
audioStutterAnalyzer.getAnalysis();
// Reduce capture quality
const config = {
captureConfig: {
quality: 0.7,
enableAudioAdaptation: true,
},
};
`4. Noise issues
`tsx
// Try different environment presets
const config = {
audioConfig: {
environmentPreset: 'noisy', // More aggressive filtering
},
};
`5. Worker not loading
`tsx
// Disable workers if needed
const config = {
observability: {
useWorker: false, // Fallback to main thread
},
};
`$3
Enable comprehensive debugging:
`tsx
const config = {
debugLogs: true,
debugAudioPerformance: true,
observability: {
enabled: true,
logToConsole: true,
},
mcp: {
debug: true,
},
};// In browser console:
audioStutterAnalyzer.setDebugMode(true);
`API Reference
$3
`tsx
// Main Components
export { SammyAgentProvider, useSammyAgentContext }// Configuration
export { getAgentConfig }
// Hooks
export {
useScreenCapture,
useRenderCapture,
useVideoCapture,
useContextUpdater,
useMicrophonePermission,
useGuides,
useGuidesManager,
useGuidesQueryParams,
}
// Core Classes
export {
SammyAgentCore,
SammyApiClient,
ScreenCaptureManager,
ObservabilityManager,
ToolManager,
MCPManager,
ContextStateManager,
ContextMemoryManager,
}
// Audio Classes
export {
AudioRecorder,
AudioStreamer,
audioStutterAnalyzer,
}
// Services
export {
createMemoryServices,
createCoreServices,
type MemoryService,
type CoreService,
}
// Types
export type {
SammyAgentConfig,
SammyAgentCallbacks,
AgentSession,
ObservabilityConfig,
ToolDefinition,
MemoryEntry,
TraceEvent,
MCPConfig,
AudioConfig,
NoiseGateConfig,
NoiseSuppressionConfig,
GuidesState,
}
// Utilities
export {
saveTextToFile,
audioContext,
setupAudioFlushOnUnload,
}
`$3
`tsx
interface SammyAgentProviderProps {
children: ReactNode;
config: SammyAgentConfigProps;
tools?: ToolDefinition[];
autoStartFromURL?: boolean;
// Callbacks
onError?: (error: unknown) => void;
onTokenExpired?: () => void;
onTurnComplete?: (summary: { agent: string; user: string }) => void;
onConnectionStateChange?: (connected: boolean) => void;
onMemoryUpdate?: (result: unknown) => void;
onToolCall?: (tool: LiveServerToolCall) => void;
onWalkthroughStart?: (walkthroughGuideId: string) => void;
}
`$3
`tsx
interface SammyAgentContextType {
// State
activeSession: AgentSession | null;
agentStatus: 'connected' | 'connecting' | 'disconnected' | 'disconnecting';
agentVolume: number; // 0-1
userVolume: number; // 0-1
config: SammyAgentConfig;
error: Error | null;
muted: boolean;
screenCapture: ScreenCapture; // Methods
startAgent: (options: StartOptions) => Promise;
stopAgent: () => void;
sendMessage: (message: string) => void;
toggleMuted: () => void;
// Start Options
interface StartOptions {
agentMode: 'user' | 'admin' | 'sammy';
sammyThreeOrganisationFeatureId?: string;
guideId?: string; // For guided experiences
}
}
`Best Practices
1. Always handle authentication properly
- Implement token refresh logic
- Handle token expiration gracefully
- Use environment variables for API URLs
2. Configure audio for your environment
- Use environment presets for quick setup
- Test noise gate threshold for your use case
- Enable noise suppression in noisy environments
3. Enable worker mode for production
- Better performance for high-frequency events
- Prevents main thread blocking
- Automatic fallback if workers fail
4. Use appropriate capture methods
-
render for web applications (recommended)
- video for full screen capture needs
- Enable audio adaptation to prevent stuttering5. Implement proper error handling
- Provider-level error boundaries
- Component-level error states
- User-friendly error messages
6. Optimize for performance
- Enable audio-aware capture
- Use appropriate quality settings
- Filter unnecessary observability events
- Use worker mode for heavy operations
7. Monitor and debug
- Enable debug logs in development
- Use observability for production monitoring
- Use audio stutter analyzer for performance issues
- Check MCP debug logs for integration issues
Migration Guide
$3
If migrating from older versions:
1. Update imports:
`tsx
// Old
import { SammyAgent } from '@sammy-labs/sammy-three-legacy';
// New
import { SammyAgentProvider, useSammyAgentContext } from '@sammy-labs/sammy-three';
`2. Update configuration:
`tsx
// Old
const agent = new SammyAgent({ apiKey: 'key' });
// New
`3. Update methods:
`tsx
// Old
agent.connect();
// New
const { startAgent } = useSammyAgentContext();
await startAgent({ agentMode: 'user' });
``4. New features to adopt:
- Audio processing with noise suppression
- Interactive guides system
- MCP tool integration
- Worker-based observability
- Critical DOM renders