Sammy Agent Core

Core AI voice agent functionality with screen capture, memory management, and live API integration. For an example implementaiton see this Github Repo.

Installation

``bash npm install @sammy-labs/sammy-three`

For UI components, also install:

`bash npm install @sammy-labs/sammy-three-ui-kit`

`Overview`

@sammy-labs/sammy-threeis a comprehensive AI voice agent package that provides: - Real-time voice conversation with Google's Gemini Live API - Advanced audio processing with noise suppression and noise gate - Screen capture (render-based or video-based) with worker optimization - Memory management with semantic search and context injection - Interactive guides for walkthrough experiences - Tool system with built-in tools and MCP integration - Observability with comprehensive event tracking and analytics - Worker-based architecture for optimal performance - Audio debugging with built-in stutter analyzer

`Table of Contents`

1. Quick Start 2. Configuration Reference 3. Authentication 4. Screen Capture 5. Audio Processing 6. Memory Management 7. Interactive Guides 8. Custom Tools 9. MCP Integration 10. Observability 11. Error Handling 12. Performance Optimization 13. Advanced Features 14. Environment Variables 15. Troubleshooting 16. API Reference

`Quick Start`

`$3`

`bash npm install @sammy-labs/sammy-three

`or`


pnpm add @sammy-labs/sammy-three
or

yarn add @sammy-labs/sammy-three

$3

`tsx import { SammyAgentProvider, useSammyAgentContext, SammyApiClient, createMemoryServices, createCoreServices, } from '@sammy-labs/sammy-three'; import '@sammy-labs/sammy-three/styles.css';

// 1. Create your authentication hook const useAuth = () => { return { token: 'your-jwt-token', baseUrl: 'https://your-api-url.com', onTokenExpired: async () => { // Handle token refresh await refreshYourToken(); }, }; };

// 2. Wrap your app function App() { const auth = useAuth();

if (!auth.token) { return

Loading authentication...

;
  }
  return (
          config={{
        auth: auth,
        captureMethod: 'render', // or 'video'
        debugLogs: true,
        model: 'models/gemini-2.5-flash-preview-native-audio-dialog',
        // Audio processing (NEW)
        audioConfig: {
          noiseSuppression: {
            enabled: true,
            enhancementLevel: 'medium',
          },
          noiseGate: {
            enabled: true,
            threshold: 0.04,
          },
        },
      }}
      onError={(error) => console.error('Agent error:', error)}
      onTokenExpired={auth.onTokenExpired}
    >
      
    
  );
}
// 3. Use in components
function ChatComponent() {
  const {
    startAgent,
    stopAgent,
    sendMessage,
    toggleMuted,
    agentStatus,
    activeSession,
    agentVolume,
    userVolume,
  } = useSammyAgentContext();
  const handleStart = async () => {
    const success = await startAgent({
      agentMode: 'user', // 'admin', 'user', or 'sammy'
      sammyThreeOrganisationFeatureId: 'your-org-id', // optional
      guideId: 'guide-123', // optional - for guided experiences
    });
    if (success) {
      console.log('Agent started successfully');
    }
  };
  return (
    

      
      
      
      
      
      
      
      
Status: {agentStatus}

      Agent Volume: {Math.round(agentVolume * 100)}%

      User Volume: {Math.round(userVolume * 100)}%

    

  );
}


Configuration Reference
$3

`tsx interface SammyAgentConfigProps { // REQUIRED: Authentication configuration auth: { token: string; // Your JWT token baseUrl?: string; // API base URL (defaults to production) onTokenExpired?: () => void; // Token refresh callback };

// Screen capture method captureMethod?: 'render' | 'video'; // Default: 'render'

// AI model to use model?: string; // Default: 'models/gemini-2.5-flash-preview-native-audio-dialog'

// Enable debug logging debugLogs?: boolean; // Default: false

// Voice settings defaultVoice?: string; // Default: 'aoede'

// Capture configuration captureConfig?: { frameRate?: number; // For video capture quality?: number; // JPEG quality (0.0-1.0) enableAudioAdaptation?: boolean; // Reduce capture during audio };

// Audio processing configuration (NEW) audioConfig?: { noiseSuppression?: { enabled?: boolean; enhancementLevel?: 'light' | 'medium' | 'aggressive'; accessKey?: string; // For Koala noise suppression modelPath?: string; // For Koala model }; noiseGate?: { enabled?: boolean; threshold?: number; // 0-1, default 0.04 attackTime?: number; // ms, default 30 holdTime?: number; // ms, default 400 releaseTime?: number; // ms, default 150 }; environmentPreset?: 'office' | 'home' | 'noisy' | 'studio' | 'custom'; };

// Observability configuration observability?: ObservabilityConfig;

// Audio performance debugging debugAudioPerformance?: boolean;

// MCP (Model Context Protocol) configuration (NEW) mcp?: MCPConfig;

// Guides configuration (NEW) guides?: { enabled?: boolean; autoStartFromURL?: boolean; queryParamName?: string; // Default: 'walkthrough' }; }`

`Canvas Ref Requirements`

⚠️ IMPORTANT: Regardless of which capture method you use ('render' or 'video'), you MUST include a canvas element in your component tree:

`tsx import { useRef } from 'react'; import { SammyAgentProvider } from '@sammy-labs/sammy-three';

function App() { // Required: Canvas ref for all capture methods const canvasRef = useRef(null);

return ( config={{ captureMethod: 'render', // or 'video' - both need canvas // ... other config }} > {/ REQUIRED: Hidden canvas for processing /}`

`$3`

Both capture methods use the canvas element for: - Frame processing: Scaling and optimizing captured frames - Base64 encoding: Converting images for AI processing - Performance optimization: Reusing canvas context for efficiency - Fallback rendering: Backup processing when primary capture fails

Common Setup Error:`tsx // ❌ Wrong: Missing canvas

// ✅ Correct: Canvas included`

`Audio Processing`

The package includes advanced audio processing capabilities for cleaner voice input:

`$3`

Two noise suppression modes are available:

1. Browser Native (Default) - Uses browser's built-in echo cancellation and noise suppression - No additional configuration needed

2. Koala AI (Advanced) - Requires Koala access key and model - Superior noise removal for professional use

`tsx const config = { audioConfig: { noiseSuppression: { enabled: true, enhancementLevel: 'medium', // 'light', 'medium', 'aggressive' // For Koala (optional) accessKey: process.env.KOALA_ACCESS_KEY, modelPath: '/models/koala_model.pv', }, }, };`

`$3`

Software-based noise gate to filter out background noise:

`tsx const config = { audioConfig: { noiseGate: { enabled: true, threshold: 0.04, // Volume threshold (0-1) attackTime: 30, // How fast gate opens (ms) holdTime: 400, // Hold open during pauses (ms) releaseTime: 150, // How fast gate closes (ms) }, }, };`

`$3`

Pre-configured settings for different environments:

`tsx const config = { audioConfig: { environmentPreset: 'home', // 'office', 'home', 'noisy', 'studio', 'custom' }, };

// Presets automatically configure: // - office: Quiet office (low threshold, light suppression) // - home: Home environment (balanced settings) // - noisy: Cafe/public space (aggressive filtering) // - studio: Professional recording (minimal processing)`

`$3`

Built-in analyzer for debugging audio issues:

`tsx // Enable audio debugging const config = { debugAudioPerformance: true, };

// In browser console: audioStutterAnalyzer.setDebugMode(true); // Enable collection audioStutterAnalyzer.getAnalysis(); // View analysis audioStutterAnalyzer.clear(); // Clear buffer

// The analyzer tracks: // - Audio underruns (stuttering) // - Long DOM captures during audio // - Buffer statistics // - Correlation between renders and stutters`

`Interactive Guides`

The package includes a complete guided walkthrough system for interactive tutorials:

`$3`

`tsx import { useGuidesManager } from '@sammy-labs/sammy-three';

function GuidedExperience() { const { startAgent } = useSammyAgentContext(); // Initialize guides manager const guides = useGuidesManager({ enabled: true, authConfig: { token: 'your-jwt-token', baseUrl: 'https://your-api.com', }, autoStartFromURL: true, // Auto-start from ?walkthrough=guide-id onStartAgent: startAgent, onWalkthroughStart: (guideId) => { console.log('Starting walkthrough:', guideId); }, });

if (!guides) return null;

return (


      {/ Display available guides /}
      {guides.userGuides.map(guide => (
        
      ))}
      {/ Current guide info /}
      {guides.currentGuide && (
        

          {guides.currentGuide.title}

          {guides.currentGuide.description}

        

      )}


  );
}


$3
Guides can be triggered via URL parameters:

`https://yourapp.com?walkthrough=onboarding-guide-v1`

`$3`

`tsx // Manual guide start const success = await guides.startWalkthrough('guide-id');

// Check for guide in URL const hasGuide = await guides.checkForGuideInUrl();

// Mark as completed (automatic when conversation starts) guides.markGuideAsCompleted('guide-id');

// Refresh guide data guides.refreshGuide(); guides.refreshUserGuides();

// Clean up URL params after use guides.cleanupUrlParams();`

`Custom Tools`

Extend agent capabilities with custom tools:

`$3`

`tsx import { ToolDefinition, ToolCategories } from '@sammy-labs/sammy-three';

const customTool: ToolDefinition = { declaration: { name: 'sendEmail', description: 'Send an email to a user', parameters: { type: 'object', properties: { to: { type: 'string', description: 'Email recipient' }, subject: { type: 'string', description: 'Email subject' }, body: { type: 'string', description: 'Email body' }, }, required: ['to', 'subject', 'body'], }, }, category: ToolCategories.ACTION, handler: async (functionCall, context) => { const { to, subject, body } = functionCall.args; try { // Your email sending logic await sendEmail({ to, subject, body }); // Emit events for state management context.emit('email:sent', { to, subject }); return { success: true, message:Email sent to ${to}, }; } catch (error) { return { success: false, error: error.message, }; } }, };

// Use with provider tools={[customTool]} config={config} > {children}`

`$3`

The package includes several built-in tools:

1. End Session Tool - Gracefully end the conversation 2. Get Context Tool - Retrieve contextual information 3. Escalate Tool - Escalate to human support

`tsx // Escalation is automatic when AI can't help // The escalate tool is called by the AI when needed // It will mark the conversation for human review`

`MCP Integration`

Support for Model Context Protocol (MCP) enables dynamic tool discovery:

`tsx const config = { mcp: { enabled: true, debug: true, servers: [ { name: 'hubspot', type: 'sse', sse: { url: 'https://mcp-server.example.com/sse', apiKey: process.env.MCP_API_KEY, }, autoConnect: true, toolPrefix: 'crm', // Tools will be prefixed: crm_create_contact }, ], autoReconnect: true, maxReconnectAttempts: 3, }, };

// MCP tools are automatically discovered and registered // They appear alongside your custom tools`

`Observability Configuration`

Enable comprehensive event tracking and analytics:

`tsx const config = { observability: { enabled: true, logToConsole: true, // Log events to console for debugging // Worker mode for performance useWorker: true, workerConfig: { batchSize: 50, batchIntervalMs: 5000, // 5 seconds }, // Data inclusion controls includeSystemPrompt: false, // Exclude sensitive prompts includeAudioData: false, // Exclude raw audio (large) includeImageData: true, // Include screen captures // Event filtering (reduce noise) disableEventTypes: [ 'audio.send', 'audio.receive', ], // Custom metadata metadata: { environment: 'production', version: '1.0.0', userId: 'user-123', },

// Audio aggregation audioAggregation: { flushIntervalMs: 30000, // 30 seconds onFlush: async (data) => { // Handle flushed audio data await uploadToS3(data); }, },

// Custom event callback callback: async (event) => { // Send to your analytics service await sendToAnalytics(event); }, }, };`

`Performance Optimization`

`$3`

The system automatically captures fresh visual context at conversation boundaries:

`tsx // Automatic critical renders happen at: // - User starts/stops speaking // - Agent starts/stops speaking // - Conversation turns complete // - User interrupts agent

// This ensures the agent always has current visual context // even during audio playback when captures are throttled`

`$3`

Screen capture automatically adapts during audio playback:

`tsx const config = { captureConfig: { enableAudioAdaptation: true, // Default: true quality: 0.8, }, };

// Normal mode: High-frequency captures (100ms-2s) // Audio mode: Reduced captures (500ms-5s) // Critical renders: Always execute at conversation boundaries`

`$3`

Multiple workers handle heavy processing:

`tsx // Workers are automatic - no configuration needed // - Canvas processing worker (image encoding) // - Observability worker (API batching) // - DOM capture workers (screenshot generation)

// All workers use Data URLs for CSP compatibility // Fallback to main thread if workers unavailable`

`Advanced Features`

`$3`

Automatic context tracking and updates:

`tsx import { useContextUpdater } from '@sammy-labs/sammy-three';

function MyComponent() { // Automatically track page changes useContextUpdater({ trackPageChanges: true, updateInterval: 5000, includeMetadata: true, });

return

Content tracked automatically

;
}
// Manual context updates
import { ContextStateManager } from '@sammy-labs/sammy-three';

const contextManager = new ContextStateManager(); contextManager.updatePageMetadata({ url: window.location.href, title: document.title, }); contextManager.updateUserPreferences({ name: 'John Doe', email: 'john@example.com', });`

`$3`

Better permission handling:

`tsx import { useMicrophonePermission } from '@sammy-labs/sammy-three';

function MicCheck() { const { permission, isChecking, checkPermission, requestPermission } = useMicrophonePermission();

if (permission === 'denied') { return

Microphone access denied. Please enable in settings.

;
  }
  if (permission === 'prompt') {
    return (
      
    );
  }
  return 
Microphone ready!
;
}


$3
For advanced use cases:

`tsx import { SammyAgentCore, SammyApiClient, createMemoryServices, ObservabilityManager, } from '@sammy-labs/sammy-three';

// Create services const apiClient = new SammyApiClient({ token: 'your-jwt-token', baseUrl: 'https://your-api.com', });

const memoryService = createMemoryServices(apiClient);

// Create agent core directly const agentCore = new SammyAgentCore({ config: { auth: { token: 'your-jwt-token' }, captureMethod: 'render', debugLogs: true, }, tools: [/ your tools /], callbacks: { onError: (error) => console.error(error), onTurnComplete: (summary) => console.log(summary), }, services: { memoryService, }, });

// Start agent await agentCore.start({ agentMode: 'user', sammyThreeOrganisationFeatureId: 'org-123', });

// Access observability const trace = agentCore.getObservabilityTrace(); const stats = agentCore.getObservabilityStatistics();`

`Environment Variables`

Common environment variables for configuration:

`bash

`API Configuration`


NEXT_PUBLIC_SAMMY_API_BASE_URL=https://your-api.com
NEXT_PUBLIC_APP_VERSION=1.0.0
Feature Flags

NEXT_PUBLIC_DISABLE_WORKER_MODE=false
Audio Processing (optional)

NEXT_PUBLIC_KOALA_ACCESS_KEY=your-key
NEXT_PUBLIC_KOALA_MODEL_PATH=/models/koala.pv
MCP Integration (optional)

NEXT_PUBLIC_MCP_API_KEY=your-mcp-key
Debug Settings (development only)

NEXT_PUBLIC_DEBUG_AUDIO=true
NEXT_PUBLIC_DEBUG_OBSERVABILITY=true


Error Handling
$3

`tsx config={config} onError={(error) => { console.error('Agent error:', error); // Handle different error types if (error.message.includes('token')) { // Handle authentication errors redirectToLogin(); } else if (error.message.includes('microphone')) { // Handle microphone permission errors showMicrophonePermissionDialog(); } else { // Handle other errors showErrorToast(error.message); } }} onTokenExpired={() => { // Handle token expiration refreshAuthToken(); }} > {children}`

`$3`

`tsx function ChatComponent() { const { error, agentStatus, startAgent } = useSammyAgentContext();

const handleStart = async () => { try { const success = await startAgent({ agentMode: 'user', }); if (!success) { console.error('Failed to start agent'); } } catch (error) { console.error('Start agent error:', error); } };

if (error) { return (


        Agent Error: {error.message}


    );
  }
  return (
    

      
      Status: {agentStatus}

    

  );
}


Troubleshooting
$3

1. "Token expired" errors`tsx // Ensure onTokenExpired is implemented const config = { auth: { token: jwtToken, onTokenExpired: async () => { await refreshToken(); }, }, };`

2. Microphone permission denied`tsx // Use the microphone permission hook const { permission, requestPermission } = useMicrophonePermission(); if (permission === 'denied') { // Show instructions to enable in browser settings }`

3. Audio stuttering`tsx // Enable audio debugging audioStutterAnalyzer.setDebugMode(true); audioStutterAnalyzer.getAnalysis(); // Reduce capture quality const config = { captureConfig: { quality: 0.7, enableAudioAdaptation: true, }, };`

4. Noise issues`tsx // Try different environment presets const config = { audioConfig: { environmentPreset: 'noisy', // More aggressive filtering }, };`

5. Worker not loading`tsx // Disable workers if needed const config = { observability: { useWorker: false, // Fallback to main thread }, };`

`$3`

Enable comprehensive debugging:

`tsx const config = { debugLogs: true, debugAudioPerformance: true, observability: { enabled: true, logToConsole: true, }, mcp: { debug: true, }, };

// In browser console: audioStutterAnalyzer.setDebugMode(true);`

`API Reference`

`$3`

`tsx // Main Components export { SammyAgentProvider, useSammyAgentContext }

// Configuration export { getAgentConfig }

// Hooks export { useScreenCapture, useRenderCapture, useVideoCapture, useContextUpdater, useMicrophonePermission, useGuides, useGuidesManager, useGuidesQueryParams, }

// Core Classes export { SammyAgentCore, SammyApiClient, ScreenCaptureManager, ObservabilityManager, ToolManager, MCPManager, ContextStateManager, ContextMemoryManager, }

// Audio Classes export { AudioRecorder, AudioStreamer, audioStutterAnalyzer, }

// Services export { createMemoryServices, createCoreServices, type MemoryService, type CoreService, }

// Types export type { SammyAgentConfig, SammyAgentCallbacks, AgentSession, ObservabilityConfig, ToolDefinition, MemoryEntry, TraceEvent, MCPConfig, AudioConfig, NoiseGateConfig, NoiseSuppressionConfig, GuidesState, }

// Utilities export { saveTextToFile, audioContext, setupAudioFlushOnUnload, }`

`$3`

`tsx interface SammyAgentProviderProps { children: ReactNode; config: SammyAgentConfigProps; tools?: ToolDefinition[]; autoStartFromURL?: boolean; // Callbacks onError?: (error: unknown) => void; onTokenExpired?: () => void; onTurnComplete?: (summary: { agent: string; user: string }) => void; onConnectionStateChange?: (connected: boolean) => void; onMemoryUpdate?: (result: unknown) => void; onToolCall?: (tool: LiveServerToolCall) => void; onWalkthroughStart?: (walkthroughGuideId: string) => void; }`

`$3`

`tsx interface SammyAgentContextType { // State activeSession: AgentSession | null; agentStatus: 'connected' | 'connecting' | 'disconnected' | 'disconnecting'; agentVolume: number; // 0-1 userVolume: number; // 0-1 config: SammyAgentConfig; error: Error | null; muted: boolean; screenCapture: ScreenCapture;

// Methods startAgent: (options: StartOptions) => Promise; stopAgent: () => void; sendMessage: (message: string) => void; toggleMuted: () => void; // Start Options interface StartOptions { agentMode: 'user' | 'admin' | 'sammy'; sammyThreeOrganisationFeatureId?: string; guideId?: string; // For guided experiences } }`

`Best Practices`

1. Always handle authentication properly - Implement token refresh logic - Handle token expiration gracefully - Use environment variables for API URLs

2. Configure audio for your environment - Use environment presets for quick setup - Test noise gate threshold for your use case - Enable noise suppression in noisy environments

3. Enable worker mode for production - Better performance for high-frequency events - Prevents main thread blocking - Automatic fallback if workers fail

4. Use appropriate capture methods -renderfor web applications (recommended) -videofor full screen capture needs - Enable audio adaptation to prevent stuttering

5. Implement proper error handling - Provider-level error boundaries - Component-level error states - User-friendly error messages

6. Optimize for performance - Enable audio-aware capture - Use appropriate quality settings - Filter unnecessary observability events - Use worker mode for heavy operations

7. Monitor and debug - Enable debug logs in development - Use observability for production monitoring - Use audio stutter analyzer for performance issues - Check MCP debug logs for integration issues

`Migration Guide`

`$3`

If migrating from older versions:

1. Update imports:`tsx // Old import { SammyAgent } from '@sammy-labs/sammy-three-legacy'; // New import { SammyAgentProvider, useSammyAgentContext } from '@sammy-labs/sammy-three';`

2. Update configuration:`tsx // Old const agent = new SammyAgent({ apiKey: 'key' }); // New`

3. Update methods:`tsx // Old agent.connect(); // New const { startAgent } = useSammyAgentContext(); await startAgent({ agentMode: 'user' });``

4. New features to adopt:
- Audio processing with noise suppression
- Interactive guides system
- MCP tool integration
- Worker-based observability
- Critical DOM renders

Sammy Agent Core

Core AI voice agent functionality with screen capture, memory management, and live API integration. For an example implementaiton see this Github Repo.

Installation

``bash npm install @sammy-labs/sammy-three`

For UI components, also install:

`bash npm install @sammy-labs/sammy-three-ui-kit`

`Overview`

`Table of Contents`

`Quick Start`

`$3`

`bash npm install @sammy-labs/sammy-three

`or`


pnpm add @sammy-labs/sammy-three
or

yarn add @sammy-labs/sammy-three

$3

`tsx import { SammyAgentProvider, useSammyAgentContext, SammyApiClient, createMemoryServices, createCoreServices, } from '@sammy-labs/sammy-three'; import '@sammy-labs/sammy-three/styles.css';

// 2. Wrap your app function App() { const auth = useAuth();

if (!auth.token) { return

Loading authentication...

;
  }
  return (
          config={{
        auth: auth,
        captureMethod: 'render', // or 'video'
        debugLogs: true,
        model: 'models/gemini-2.5-flash-preview-native-audio-dialog',
        // Audio processing (NEW)
        audioConfig: {
          noiseSuppression: {
            enabled: true,
            enhancementLevel: 'medium',
          },
          noiseGate: {
            enabled: true,
            threshold: 0.04,
          },
        },
      }}
      onError={(error) => console.error('Agent error:', error)}
      onTokenExpired={auth.onTokenExpired}
    >
      
    
  );
}
// 3. Use in components
function ChatComponent() {
  const {
    startAgent,
    stopAgent,
    sendMessage,
    toggleMuted,
    agentStatus,
    activeSession,
    agentVolume,
    userVolume,
  } = useSammyAgentContext();
  const handleStart = async () => {
    const success = await startAgent({
      agentMode: 'user', // 'admin', 'user', or 'sammy'
      sammyThreeOrganisationFeatureId: 'your-org-id', // optional
      guideId: 'guide-123', // optional - for guided experiences
    });
    if (success) {
      console.log('Agent started successfully');
    }
  };
  return (
    

      
      
      
      
      
      
      
      
Status: {agentStatus}

      Agent Volume: {Math.round(agentVolume * 100)}%

      User Volume: {Math.round(userVolume * 100)}%

    

  );
}


Configuration Reference
$3