Sarvam Conv AI SDK

TypeScript SDK for building real-time voice-to-voice and text-based conversational AI applications across multiple platforms.

Features

- Real-time voice-to-voice conversations in the browser
- Text-based chat with streaming responses
- Automatic microphone capture and speaker playback
- Multi-language support (11 Indian languages + English)
- WebSocket-based real-time communication
- Cross-platform: Browser, React Native, and Node.js support

Installation

$3

``bash npm install sarvam-conv-ai-sdk`

`$3`

`bash npm install sarvam-conv-ai-sdk npm install react-native-audio-api`

`$3`

`bash npm install sarvam-conv-ai-sdk ws`

`Platform-Specific Imports`

⚠️ Important: Always use platform-specific imports to avoid bundling errors and reduce bundle size.

The SDK provides platform-optimized entry points:

`$3`

`typescript // ✅ Always use the /browser entry point for web applications import { ConversationAgent, BrowserAudioInterface } from 'sarvam-conv-ai-sdk/browser';`

Why? The browser entry point excludes React Native dependencies, preventing bundler errors like Cannot resolve 'react-native'.

`$3`

`typescript // ✅ Always use the /react-native entry point for React Native apps import { ConversationAgent, RNAudioInterface } from 'sarvam-conv-ai-sdk/react-native';`

Why? The React Native entry point includes native module support for iOS and Android.

`$3`

`typescript // Use the default entry point for Node.js import { ConversationAgent } from 'sarvam-conv-ai-sdk';`

`Quick Start`

`$3`

`typescript import React, { useRef, useState } from 'react'; import { ConversationAgent, BrowserAudioInterface, InteractionType, type ServerTextMsgType, } from 'sarvam-conv-ai-sdk/browser';

function VoiceChat() { const [isConnected, setIsConnected] = useState(false); const [transcript, setTranscript] = useState(''); const agentRef = useRef(null);

agentRef.current = agent; await agent.start(); await agent.waitForConnect(10); } catch (error) { console.error('Error:', error); } };

const stopConversation = async () => { if (agentRef.current) { await agentRef.current.stop(); agentRef.current = null; } };

return (


      Voice Chat

      {!isConnected ? (
        
      ) : (
        
      )}
      Transcript: {transcript}


  );
}

export default VoiceChat;`

`$3`

`javascript const { ConversationAgent, InteractionType } = require('sarvam-conv-ai-sdk');

async function main() { const agent = new ConversationAgent({ apiKey: 'your_api_key', config: { org_id: 'your_org_id', workspace_id: 'your_workspace_id', app_id: 'your_app_id', user_identifier: 'user@example.com', user_identifier_type: 'email', interaction_type: InteractionType.TEXT, sample_rate: 16000, }, textCallback: async (msg) => { console.log('Agent:', msg.text); }, startCallback: async () => { console.log('Conversation started!'); }, });

await agent.start(); const connected = await agent.waitForConnect(10); if (connected) { await agent.sendText('Hello, how are you?'); await agent.waitForDisconnect(); } }

main().catch(console.error);`

`API Reference`

`$3`

The main class for managing conversational AI sessions.

#### Constructor Parameters

| Parameter | Type | Required | Description | | --- | --- | --- | --- | | apiKey | string | Yes | API key for authentication | | config | InteractionConfig | Yes | Interaction configuration | | platform | 'browser' \| 'node' | No | Platform type (auto-detected) | | audioInterface | AsyncAudioInterface | No | Audio interface for voice interactions | | textCallback | (msg: ServerTextMsgType) => Promise\ | No | Receives streaming text chunks | | audioCallback | (msg: ServerAudioChunkMsg) => Promise\ | No | Receives audio chunks | | eventCallback | (event: ServerEventBase) => Promise\ | No | Receives events | | startCallback | () => Promise\ | No | Called when conversation starts | | endCallback | () => Promise\ | No | Called when conversation ends | | baseUrl | string | No | Override base URL |

#### Methods

- async start()- Start the conversation session -async stop()- Stop the conversation and cleanup -async waitForConnect(timeout?)- Wait for connection (returns boolean) -async waitForDisconnect()- Wait until disconnected -isConnected()- Check connection status -getInteractionId()- Get current interaction ID -async sendAudio(audioData)- Send raw audio (voice mode only) -async sendText(text)- Send text message (text mode only) -getAgentType() - Get agent type ('voice' or 'text')

`$3`

#### Required Fields

| Field | Type | Description | | --- | --- | --- | | user_identifier_type | string | One of: 'custom', 'email', 'phone_number', 'unknown' | | user_identifier | string | User identifier value | | org_id | string | Your organization ID | | workspace_id | string | Your workspace ID | | app_id | string | The target application ID | | interaction_type | InteractionType | InteractionType.CALL or InteractionType.TEXT | | sample_rate | number | Audio sample rate: 8000, 16000, or 22000 |

#### Optional Fields

| Field | Type | Description | | --- | --- | --- | | version | number | App version (uses latest if not provided) | | agent_variables | Record\ | Key-value pairs for agent context | | initial_language_name | SarvamToolLanguageName | Starting language | | initial_state_name | string | Starting state name | | initial_bot_message | string | First message from agent |

`$3`

Handles microphone capture and speaker playback in browser environments.

`typescript import { BrowserAudioInterface } from 'sarvam-conv-ai-sdk';

const audioInterface = new BrowserAudioInterface();`

Features: - Automatic microphone access and audio capture - Real-time audio streaming at 16kHz - Automatic speaker playback - Handles user interruptions

Requirements: - HTTPS connection (required for microphone access) - Modern browser with WebAudio API support - User permission for microphone access

`Event Handling`

`$3`

Receives streaming text chunks from the agent:

`typescript textCallback: async (msg: ServerTextMsgType) => { console.log('Agent says:', msg.text); }`

`$3`

Receives various events during conversation:

`typescript eventCallback: async (event: ServerEventBase) => { switch (event.type) { case 'server.action.interaction_connected': console.log('Connected'); break; case 'server.event.user_interrupt': console.log('User interrupted'); break; case 'server.action.interaction_end': console.log('Conversation ended'); break; case 'server.event.user_speech_start': console.log('User started speaking'); break; case 'server.event.user_speech_end': console.log('User stopped speaking'); break; } }`

`Supported Languages`

The SDK supports 11 Indian languages plus English:

`typescript import { SarvamToolLanguageName } from 'sarvam-conv-ai-sdk';

// Available: BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL, // TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH

const config = { initial_language_name: SarvamToolLanguageName.HINDI, };`

`Best Practices`

Resource Cleanup: Always cleanup resources when component unmounts

`typescript useEffect(() => { return () => agentRef.current?.stop().catch(console.error); }, []);`

Connection Timeout: Specify timeout when waiting for connection

`typescript const connected = await agent.waitForConnect(10); // 10 seconds if (!connected) console.error('Connection timeout');`

Error Handling: Wrap agent operations in try-catch blocks

`typescript try { await agent.start(); await agent.waitForConnect(10); } catch (error) { console.error('Error:', error); await agent.stop(); }`

Secure API Keys: Use environment variables or backend proxy

`typescript // Use environment variables const apiKey = import.meta.env.VITE_SARVAM_API_KEY;

// Or use backend proxy const agent = new ConversationAgent({ baseUrl: '/api/proxy/' });`

`Examples`

- Web Example - See examples/webfor a complete React + TypeScript application - Node.js Example - Seeexamples/nodejs/simple-text-chat.js` for a command-line text chat

Troubleshooting

Microphone Not Working: Ensure HTTPS connection, check browser permissions, verify microphone is not in use by another app

Connection Timeout: Check network connectivity, verify API key is valid, ensure app_id exists and has a committed version

Audio Quality Issues: Verify sample rate matches configuration (8000, 16000, or 22000), ensure audio format is LINEAR16 (16-bit PCM mono)

License

MIT

Sarvam Conv AI SDK

TypeScript SDK for building real-time voice-to-voice and text-based conversational AI applications across multiple platforms.

Features

Installation

$3

``bash npm install sarvam-conv-ai-sdk`

`$3`

`bash npm install sarvam-conv-ai-sdk npm install react-native-audio-api`

`$3`

`bash npm install sarvam-conv-ai-sdk ws`

`Platform-Specific Imports`

⚠️ Important: Always use platform-specific imports to avoid bundling errors and reduce bundle size.

The SDK provides platform-optimized entry points:

`$3`

`typescript // ✅ Always use the /browser entry point for web applications import { ConversationAgent, BrowserAudioInterface } from 'sarvam-conv-ai-sdk/browser';`

Why? The browser entry point excludes React Native dependencies, preventing bundler errors like Cannot resolve 'react-native'.

`$3`

`typescript // ✅ Always use the /react-native entry point for React Native apps import { ConversationAgent, RNAudioInterface } from 'sarvam-conv-ai-sdk/react-native';`

Why? The React Native entry point includes native module support for iOS and Android.

`$3`

`typescript // Use the default entry point for Node.js import { ConversationAgent } from 'sarvam-conv-ai-sdk';`

`Quick Start`

`$3`

`typescript import React, { useRef, useState } from 'react'; import { ConversationAgent, BrowserAudioInterface, InteractionType, type ServerTextMsgType, } from 'sarvam-conv-ai-sdk/browser';

function VoiceChat() { const [isConnected, setIsConnected] = useState(false); const [transcript, setTranscript] = useState(''); const agentRef = useRef(null);

agentRef.current = agent; await agent.start(); await agent.waitForConnect(10); } catch (error) { console.error('Error:', error); } };

const stopConversation = async () => { if (agentRef.current) { await agentRef.current.stop(); agentRef.current = null; } };

return (


      Voice Chat

      {!isConnected ? (
        
      ) : (
        
      )}
      Transcript: {transcript}


  );
}

export default VoiceChat;`

`$3`

`javascript const { ConversationAgent, InteractionType } = require('sarvam-conv-ai-sdk');

await agent.start(); const connected = await agent.waitForConnect(10); if (connected) { await agent.sendText('Hello, how are you?'); await agent.waitForDisconnect(); } }

main().catch(console.error);`

`API Reference`

`$3`

The main class for managing conversational AI sessions.

#### Constructor Parameters

#### Methods

`$3`

#### Required Fields

#### Optional Fields

`$3`

Handles microphone capture and speaker playback in browser environments.

`typescript import { BrowserAudioInterface } from 'sarvam-conv-ai-sdk';

const audioInterface = new BrowserAudioInterface();`

Features: - Automatic microphone access and audio capture - Real-time audio streaming at 16kHz - Automatic speaker playback - Handles user interruptions

Requirements: - HTTPS connection (required for microphone access) - Modern browser with WebAudio API support - User permission for microphone access

`Event Handling`

`$3`

Receives streaming text chunks from the agent:

`typescript textCallback: async (msg: ServerTextMsgType) => { console.log('Agent says:', msg.text); }`

`$3`

Receives various events during conversation:

`Supported Languages`

The SDK supports 11 Indian languages plus English:

`typescript import { SarvamToolLanguageName } from 'sarvam-conv-ai-sdk';

// Available: BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL, // TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH

const config = { initial_language_name: SarvamToolLanguageName.HINDI, };`

`Best Practices`

Resource Cleanup: Always cleanup resources when component unmounts

`typescript useEffect(() => { return () => agentRef.current?.stop().catch(console.error); }, []);`

Connection Timeout: Specify timeout when waiting for connection

`typescript const connected = await agent.waitForConnect(10); // 10 seconds if (!connected) console.error('Connection timeout');`

Error Handling: Wrap agent operations in try-catch blocks

`typescript try { await agent.start(); await agent.waitForConnect(10); } catch (error) { console.error('Error:', error); await agent.stop(); }`

Secure API Keys: Use environment variables or backend proxy

`typescript // Use environment variables const apiKey = import.meta.env.VITE_SARVAM_API_KEY;

// Or use backend proxy const agent = new ConversationAgent({ baseUrl: '/api/proxy/' });`

`Examples`

- Web Example - See examples/webfor a complete React + TypeScript application - Node.js Example - Seeexamples/nodejs/simple-text-chat.js` for a command-line text chat

Troubleshooting

Microphone Not Working: Ensure HTTPS connection, check browser permissions, verify microphone is not in use by another app

Connection Timeout: Check network connectivity, verify API key is valid, ensure app_id exists and has a committed version

Audio Quality Issues: Verify sample rate matches configuration (8000, 16000, or 22000), ensure audio format is LINEAR16 (16-bit PCM mono)

License

MIT