@volley/recognition-client-sdk

TypeScript SDK for real-time speech recognition via WebSocket.

Installation

``bash npm install @volley/recognition-client-sdk`

`Quick Start`

`typescript import { createClientWithBuilder, RecognitionProvider, DeepgramModel, STAGES } from '@volley/recognition-client-sdk';

// Create client with builder pattern (recommended) const client = createClientWithBuilder(builder => builder .stage(STAGES.STAGING) // ✨ Simple environment selection using enum .provider(RecognitionProvider.DEEPGRAM) .model(DeepgramModel.NOVA_2) .onTranscript(result => { console.log('Final:', result.finalTranscript); console.log('Interim:', result.pendingTranscript); }) .onError(error => console.error(error)) );

// Stream audio await client.connect(); client.sendAudio(pcm16AudioChunk); // Call repeatedly with audio chunks await client.stopRecording(); // Wait for final transcript

// Check the actual URL being used console.log('Connected to:', client.getUrl());`

`$3`

`typescript import { RealTimeTwoWayWebSocketRecognitionClient, RecognitionProvider, DeepgramModel, Language, STAGES } from '@volley/recognition-client-sdk';

const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, // ✨ Recommended: Use STAGES enum for type safety asrRequestConfig: { provider: RecognitionProvider.DEEPGRAM, model: DeepgramModel.NOVA_2, language: Language.ENGLISH_US }, onTranscript: (result) => console.log(result), onError: (error) => console.error(error) });

// Check the actual URL being used console.log('Connected to:', client.getUrl());`

`Configuration`

`$3`

Recommended: Use stage parameter with STAGES enum for automatic environment configuration:

`typescript import { RecognitionProvider, DeepgramModel, Language, STAGES } from '@volley/recognition-client-sdk';

builder .stage(STAGES.STAGING) // STAGES.LOCAL | STAGES.DEV | STAGES.STAGING | STAGES.PRODUCTION .provider(RecognitionProvider.DEEPGRAM) // DEEPGRAM, GOOGLE .model(DeepgramModel.NOVA_2) // Provider-specific model enum .language(Language.ENGLISH_US) // Language enum .interimResults(true) // Enable partial transcripts`

Available Stages and URLs:

| Stage | Enum | WebSocket URL | |-------|------|---------------| | Local |STAGES.LOCAL | ws://localhost:3101/ws/v1/recognize| | Development |STAGES.DEV | wss://recognition-service-dev.volley-services.net/ws/v1/recognize| | Staging |STAGES.STAGING | wss://recognition-service-staging.volley-services.net/ws/v1/recognize| | Production |STAGES.PRODUCTION | wss://recognition-service.volley-services.net/ws/v1/recognize |

> 💡 Using the stage parameter automatically constructs the correct URL for each environment.

Automatic Connection Retry:

The SDK automatically retries failed connections with sensible defaults - no configuration needed!

Default behavior (works out of the box): - 4 connection attempts (try once, retry 3 times if failed) - 200ms delay between retries - Handles temporary service unavailability (503) - Fast failure (~600ms total on complete failure) - Timing:Attempt 1 → FAIL → wait 200ms → Attempt 2 → FAIL → wait 200ms → Attempt 3 → FAIL → wait 200ms → Attempt 4

`typescript import { STAGES } from '@volley/recognition-client-sdk';

// ✅ Automatic retry - no config needed! const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, // connectionRetry works automatically with defaults });`

Optional: Customize retry behavior (only if needed):`typescript const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, connectionRetry: { maxAttempts: 2, // Fewer attempts (min: 1, max: 5) delayMs: 500 // Longer delay between attempts } });`

> ⚠️ Note: Retry only applies to initial connection establishment. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this).

Advanced: Custom URL for non-standard endpoints:

`typescript builder .url('wss://custom-endpoint.example.com/ws/v1/recognize') // Custom WebSocket URL .provider(RecognitionProvider.DEEPGRAM) // ... rest of config`

> 💡 Note: If both stage and url are provided, url takes precedence.

`$3`

`typescript builder .onTranscript(result => {}) // Handle transcription results .onError(error => {}) // Handle errors .onConnected(() => {}) // Connection established .onDisconnected((code) => {}) // Connection closed .onMetadata(meta => {}) // Timing information`

`$3`

`typescript builder .gameContext({ // Context for better recognition gameId: 'session-123', prompt: 'Expected responses: yes, no, maybe' }) .userId('user-123') // User identification .platform('web') // Platform identifier .logger((level, msg, data) => {}) // Custom logging`

`API Reference`

`$3`

`typescript await client.connect(); // Establish connection client.sendAudio(chunk); // Send PCM16 audio await client.stopRecording(); // End and get final transcript client.getAudioUtteranceId(); // Get session UUID client.getUrl(); // Get actual WebSocket URL being used client.getState(); // Get current state client.isConnected(); // Check connection status`

`$3`

`typescript { type: 'Transcription'; // Message type discriminator audioUtteranceId: string; // Session UUID finalTranscript: string; // Confirmed text (won't change) finalTranscriptConfidence?: number; // Confidence 0-1 for final transcript pendingTranscript?: string; // In-progress text (may change) pendingTranscriptConfidence?: number; // Confidence 0-1 for pending transcript is_finished: boolean; // Transcription complete (last message) voiceStart?: number; // Voice activity start time (ms from stream start) voiceDuration?: number; // Voice duration (ms) voiceEnd?: number; // Voice activity end time (ms from stream start) startTimestamp?: number; // Transcription start timestamp (ms) endTimestamp?: number; // Transcription end timestamp (ms) receivedAtMs?: number; // Server receive timestamp (ms since epoch) accumulatedAudioTimeMs?: number; // Total audio duration sent (ms) }`

`Providers`

`$3`

`typescript import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk';

builder .provider(RecognitionProvider.DEEPGRAM) .model(DeepgramModel.NOVA_2); // NOVA_2, NOVA_3, FLUX_GENERAL_EN`

`$3`

`typescript import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk';

builder .provider(RecognitionProvider.GOOGLE) .model(GoogleModel.LATEST_SHORT); // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc.`

Available Google models: -LATEST_SHORT- Optimized for short audio (< 1 minute) -LATEST_LONG- Optimized for long audio (> 1 minute) -TELEPHONY- Optimized for phone audio -TELEPHONY_SHORT- Short telephony audio -MEDICAL_DICTATION- Medical dictation (premium) -MEDICAL_CONVERSATION - Medical conversations (premium)

`Audio Format`

The SDK expects PCM16 audio: - Format: Linear PCM (16-bit signed integers) - Sample Rate: 16kHz recommended - Channels: Mono Please reach out to AI team if ther are essential reasons that we need other formats.

`Error Handling`

`typescript builder.onError(error => { console.error(Error ${error.code}: ${error.message}); });

// Check disconnection type import { isNormalDisconnection } from '@volley/recognition-client-sdk';

builder.onDisconnected((code, reason) => { if (!isNormalDisconnection(code)) { console.error('Unexpected disconnect:', code); } });`

`Troubleshooting`

`$3`

WebSocket fails to connect - Verify the recognition service is running - Check the WebSocket URL format:ws:// or wss://- Ensure network allows WebSocket connections

Authentication errors - VerifyaudioUtteranceIdis provided - Check if service requires additional auth headers

`$3`

No transcription results - Confirm audio format is PCM16, 16kHz, mono - Check if audio chunks are being sent (useonAudioSentcallback) - Verify audio data is not empty or corrupted

Poor transcription quality - Try different models (e.g.,NOVA_2 vs NOVA_2_GENERAL) - Adjust language setting to match audio - Ensure audio sample rate matches configuration

`$3`

High latency - Use smaller audio chunks (e.g., 100ms instead of 500ms) - Choose a model optimized for real-time (e.g., Deepgram Nova 2) - Check network latency to service

Memory issues - Calldisconnect()when done to clean up resources - Avoid keeping multiple client instances active

`Publishing`

This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC).

`$3`

After the first manual publish, configure npm Trusted Publishers:

1. Go to https://www.npmjs.com/package/@volley/recognition-client-sdk/access 2. Click "Add publisher" → Select "GitHub Actions" 3. Configure: - Organization:Volley-Inc- Repository:recognition-service- Workflow:sdk-release.yml- Environment: Leave empty (not required)

`$3`

- Automated releases: Push to devbranch triggers semantic-release - Version bumping: Based on conventional commits (feat/fix/BREAKING CHANGE) - No tokens needed: Uses OIDC authentication with npm - Provenance: Automatic supply chain attestation - Path filtering: Only releases when SDK or libs change

`$3`

If needed for testing:

`bash cd packages/client-sdk-ts npm login --scope=@volley pnpm build npm publish --provenance --access public`

`Contributing`

This SDK is part of the Recognition Service monorepo. To contribute:

1. Make changes to SDK or libs 2. Test locally withpnpm test3. Create PR todev branch with conventional commit messages (feat:, fix:`, etc.)
4. After merge, automated workflow will publish new version to npm

License

Proprietary

@volley/recognition-client-sdk

TypeScript SDK for real-time speech recognition via WebSocket.

Installation

``bash npm install @volley/recognition-client-sdk`

`Quick Start`

`typescript import { createClientWithBuilder, RecognitionProvider, DeepgramModel, STAGES } from '@volley/recognition-client-sdk';

// Stream audio await client.connect(); client.sendAudio(pcm16AudioChunk); // Call repeatedly with audio chunks await client.stopRecording(); // Wait for final transcript

// Check the actual URL being used console.log('Connected to:', client.getUrl());`

`$3`

`typescript import { RealTimeTwoWayWebSocketRecognitionClient, RecognitionProvider, DeepgramModel, Language, STAGES } from '@volley/recognition-client-sdk';

// Check the actual URL being used console.log('Connected to:', client.getUrl());`

`Configuration`

`$3`

Recommended: Use stage parameter with STAGES enum for automatic environment configuration:

`typescript import { RecognitionProvider, DeepgramModel, Language, STAGES } from '@volley/recognition-client-sdk';

Available Stages and URLs:

> 💡 Using the stage parameter automatically constructs the correct URL for each environment.

Automatic Connection Retry:

The SDK automatically retries failed connections with sensible defaults - no configuration needed!

`typescript import { STAGES } from '@volley/recognition-client-sdk';

// ✅ Automatic retry - no config needed! const client = new RealTimeTwoWayWebSocketRecognitionClient({ stage: STAGES.STAGING, // connectionRetry works automatically with defaults });`

> ⚠️ Note: Retry only applies to initial connection establishment. If the connection drops during audio streaming, the SDK will not auto-retry (caller must handle this).

Advanced: Custom URL for non-standard endpoints:

`typescript builder .url('wss://custom-endpoint.example.com/ws/v1/recognize') // Custom WebSocket URL .provider(RecognitionProvider.DEEPGRAM) // ... rest of config`

> 💡 Note: If both stage and url are provided, url takes precedence.

`$3`

`API Reference`

`$3`

`Providers`

`$3`

`typescript import { RecognitionProvider, DeepgramModel } from '@volley/recognition-client-sdk';

builder .provider(RecognitionProvider.DEEPGRAM) .model(DeepgramModel.NOVA_2); // NOVA_2, NOVA_3, FLUX_GENERAL_EN`

`$3`

`typescript import { RecognitionProvider, GoogleModel } from '@volley/recognition-client-sdk';

builder .provider(RecognitionProvider.GOOGLE) .model(GoogleModel.LATEST_SHORT); // LATEST_SHORT, LATEST_LONG, TELEPHONY, etc.`

`Audio Format`

`Error Handling`

`typescript builder.onError(error => { console.error(Error ${error.code}: ${error.message}); });

// Check disconnection type import { isNormalDisconnection } from '@volley/recognition-client-sdk';

builder.onDisconnected((code, reason) => { if (!isNormalDisconnection(code)) { console.error('Unexpected disconnect:', code); } });`

`Troubleshooting`

`$3`

WebSocket fails to connect - Verify the recognition service is running - Check the WebSocket URL format:ws:// or wss://- Ensure network allows WebSocket connections

Authentication errors - VerifyaudioUtteranceIdis provided - Check if service requires additional auth headers

`$3`

No transcription results - Confirm audio format is PCM16, 16kHz, mono - Check if audio chunks are being sent (useonAudioSentcallback) - Verify audio data is not empty or corrupted

Poor transcription quality - Try different models (e.g.,NOVA_2 vs NOVA_2_GENERAL) - Adjust language setting to match audio - Ensure audio sample rate matches configuration

`$3`

High latency - Use smaller audio chunks (e.g., 100ms instead of 500ms) - Choose a model optimized for real-time (e.g., Deepgram Nova 2) - Check network latency to service

Memory issues - Calldisconnect()when done to clean up resources - Avoid keeping multiple client instances active

`Publishing`

This package uses automated publishing via semantic-release with npm Trusted Publishers (OIDC).

`$3`

After the first manual publish, configure npm Trusted Publishers:

`$3`

If needed for testing:

`bash cd packages/client-sdk-ts npm login --scope=@volley pnpm build npm publish --provenance --access public`

`Contributing`

This SDK is part of the Recognition Service monorepo. To contribute:

License

Proprietary

@volley/recognition-client-sdk-node18

@volley/recognition-client-sdk

Installation

Quick Start

$3

Configuration

$3

$3

$3

API Reference

$3

$3

Providers

$3

$3

Audio Format

Error Handling

Troubleshooting

$3

$3

$3

Publishing

$3

$3

$3

Contributing

License

@volley/recognition-client-sdk-node18

@volley/recognition-client-sdk

Installation

Quick Start

$3

Configuration

$3

$3

$3

API Reference

$3

$3

Providers

$3

$3

Audio Format

Error Handling

Troubleshooting

$3

$3

$3

Publishing

$3

$3

$3

Contributing

License

`Quick Start`

`$3`

`Configuration`

`$3`

`$3`

`$3`

`API Reference`

`$3`

`$3`

`Providers`

`$3`

`$3`

`Audio Format`

`Error Handling`

`Troubleshooting`

`$3`

`$3`

`$3`

`Publishing`

`$3`

`$3`

`$3`

`Contributing`

`Quick Start`

`$3`

`Configuration`

`$3`

`$3`

`$3`

`API Reference`

`$3`

`$3`

`Providers`

`$3`

`$3`

`Audio Format`

`Error Handling`

`Troubleshooting`

`$3`

`$3`

`$3`

`Publishing`

`$3`

`$3`

`$3`

`Contributing`