react-native-sherpa-onnx-offline-stt

A React Native library for offline speech-to-text using sherpa-onnx. Runs entirely on-device with no internet connection required.

Features

- Offline STT - Speech recognition runs locally on the device
- Two modes: Streaming (real-time) and Offline (VAD-triggered batch processing)
- TEN-VAD - Voice Activity Detection for accurate speech segmentation
- Speaker Diarization - Identify different speakers in conversation
- Speech Denoising - GTCRN-based noise reduction
- Background Recording - Continue recording when app is minimized
- Performance Metrics - RTFx, processing time, confidence scores
- Streaming State - Two-tier volatile/confirmed transcript updates

Installation

``bash npm install react-native-sherpa-onnx-offline-stt

`or`


yarn add react-native-sherpa-onnx-offline-stt

$3

Add to your android/app/build.gradle:

`gradle android { packagingOptions { pickFirst '*/.so' } }

dependencies { implementation 'com.k2fsa.sherpa:sherpa-onnx:1.10.+' }`

`$3`

`bash cd ios && pod install`

`Models`

You need to download the models separately and place them on the device:

`$3`

- Streaming: Zipformer French (~128MB) - Offline: Parakeet TDT v3 (~670MB)

`$3`

- TEN-VAD - Included in the library

`$3`

- 3D-Speaker (~26MB)

`$3`

- GTCRN (~524KB)

`Usage`

`typescript import STTManager from 'react-native-sherpa-onnx-offline-stt'; import type { STTResult, VADEvent, SpeakerEvent } from 'react-native-sherpa-onnx-offline-stt';

// Create manager instance const sttManager = new STTManager();

// Initialize with configuration await sttManager.initialize({ modelPath: '/path/to/stt-model', tokensPath: '/path/to/tokens.txt', modelType: 'offline', // or 'streaming' vadModelPath: '/path/to/vad-model', sampleRate: 16000,

// Structured VAD configuration vad: { threshold: 0.5, minSpeechDurationMs: 300, minSilenceDurationMs: 500, maxSpeechDurationMs: 30000, // Force segment break after 30s speechPaddingMs: 100, mode: 'normal', // 'aggressive' | 'normal' | 'sensitive' },

// Optional features diarizationModelPath: '/path/to/speaker-model.onnx', diarizationThreshold: 0.55, denoiserModelPath: '/path/to/gtcrn_simple.onnx', });

// Subscribe to events using chainable API sttManager .on('transcript', (result: STTResult) => { console.log([Speaker ${result.speakerId}]: ${result.text}); console.log(RTFx: ${result.rtfx}, Processing: ${result.processingTime}s); }) .on('streaming', (update) => { // Two-tier streaming state console.log('Confirmed:', update.confirmed); // Stable text console.log('Volatile:', update.volatile); // May change }) .on('vad', (event: VADEvent) => { console.log(VAD: ${event.state}); }) .on('speaker', (event: SpeakerEvent) => { console.log(Speaker ${event.speakerId} (${event.status})); }) .on('error', (error) => { console.error(Error: ${error.code} - ${error.message}); });

// Start recording await sttManager.startRecording();

// Stop recording const results = await sttManager.stopRecording();

// Clean up await sttManager.deinitialize();`

`API Reference`

`$3`

`typescript const manager = new STTManager();`

#### Properties

| Property | Type | Description | |----------|------|-------------| |initialized | boolean| Whether the engine is initialized | |recording | boolean | Whether currently recording |

#### Methods

| Method | Returns | Description | |--------|---------|-------------| |initialize(config) | Promise| Initialize STT engine | |startRecording() | Promise| Start microphone recording | |stopRecording() | Promise| Stop and get final results | |recognizeFile(path) | Promise| Transcribe audio file | |isRecordingAsync() | Promise| Check recording status | |getModelType() | Promise| Get current mode | |getSpeakerCount() | Promise| Get detected speakers | |resetSpeakers() | Promise| Clear speaker profiles | |setDenoiserEnabled(bool) | Promise| Toggle denoiser | |isDenoiserEnabled() | Promise| Check denoiser status | |startBackgroundService() | Promise| Enable background recording | |stopBackgroundService() | Promise| Disable background recording | |deinitialize() | Promise| Clean up resources | |on(event, callback) | this| Subscribe to events (chainable) | |off(event) | this | Unsubscribe from events (chainable) |

#### Static Methods

| Method | Returns | Description | |--------|---------|-------------| |STTManager.getAvailableProviders() | Promise| Get available ONNX providers | |STTManager.platform | string | Current platform ('ios' or 'android') |

`$3`

`typescript interface STTConfig { // Required modelPath: string; tokensPath: string; vadModelPath: string;

// STT mode modelType?: 'streaming' | 'offline'; // Default: 'streaming'

// VAD configuration vad: { threshold: number; // 0.5 - Speech detection sensitivity minSpeechDurationMs: number; // 300 - Min speech to trigger minSilenceDurationMs: number; // 500 - Silence to end segment maxSpeechDurationMs: number; // 30000 - Force break long speech speechPaddingMs: number; // 100 - Padding around segments mode: 'aggressive' | 'normal' | 'sensitive'; };

// Audio sampleRate?: number; // Default: 16000

// Speaker diarization (optional) diarizationModelPath?: string; diarizationThreshold?: number; // Default: 0.45 diarizationMinSpeechMs?: number; // Default: 800

// Denoiser (optional) denoiserModelPath?: string;

// ONNX provider provider?: 'cpu' | 'nnapi' | 'gpu' | 'coreml'; // Default: 'cpu' }`

`$3`

#### transcript

`typescript interface STTResult { text: string; isFinal: boolean; startTime: number; endTime: number;

// Performance metrics confidence: number; // 0-1 recognition confidence processingTime: number; // Seconds to process audioDuration: number; // Audio length in seconds rtfx: number; // Real-time factor (>1 = faster than real-time)

// Speaker info speakerId?: number; speakerStatus?: 'pending' | 'confirmed'; }`

#### streaming

Two-tier transcript state for smoother UX:

`typescript interface StreamingTranscriptUpdate { volatile: string; // Current hypothesis (may change) confirmed: string; // Stable text (won't change) fullText: string; // confirmed + volatile isFinal: boolean; confidence: number; processingTime: number; rtfx: number; }`

#### vad

`typescript interface VADEvent { state: 'silence' | 'speech_start' | 'speech' | 'speech_end'; speechProbability: number; speechDurationMs: number; silenceDurationMs: number; }`

#### speaker

`typescript interface SpeakerEvent { speakerId: number; status: 'pending' | 'confirmed'; justConfirmed: boolean; totalSpeakers: number; }`

#### error

`typescript interface STTError { code: string; message: string; }`

`$3`

| Mode | Description | Use Case | |------|-------------|----------| |aggressive| Less sensitive, fewer false positives | Noisy environments | |normal| Balanced sensitivity | General use | |sensitive | More sensitive, catches quieter speech | Quiet environments |

`Streaming vs Offline Mode`

| Feature | Streaming | Offline | |---------|-----------|---------| | Latency | Real-time partial results | Results after speech ends | | Accuracy | Good | Better | | Use case | Live captions | Meeting transcription | | Models | Zipformer | Parakeet, Whisper |

`Background Recording`

To record when the app is in background:

`typescript // Before starting recording await sttManager.startBackgroundService(); await sttManager.startRecording();

// When done await sttManager.stopRecording(); await sttManager.stopBackgroundService();`

This shows a notification while recording in background.

`Provider Detection`

Check available ONNX providers on the device:

`typescript const info = await STTManager.getAvailableProviders(); console.log(Device: ${info.manufacturer} ${info.device}); console.log(Recommended: ${info.recommended}); console.log('Available:', info.providers.filter(p => p.available).map(p => p.name));``

Platform Support

| Platform | Status |
|----------|--------|
| Android | Full support |
| iOS | Full support |

License

MIT

Credits

- sherpa-onnx - Speech recognition engine
- TEN-VAD - Voice activity detection
- GTCRN - Speech enhancement

react-native-sherpa-onnx-offline-stt

A React Native library for offline speech-to-text using sherpa-onnx. Runs entirely on-device with no internet connection required.

Features

Installation

``bash npm install react-native-sherpa-onnx-offline-stt

`or`


yarn add react-native-sherpa-onnx-offline-stt

$3

Add to your android/app/build.gradle:

`gradle android { packagingOptions { pickFirst '*/.so' } }

dependencies { implementation 'com.k2fsa.sherpa:sherpa-onnx:1.10.+' }`

`$3`

`bash cd ios && pod install`

`Models`

You need to download the models separately and place them on the device:

`$3`

- Streaming: Zipformer French (~128MB) - Offline: Parakeet TDT v3 (~670MB)

`$3`

- TEN-VAD - Included in the library

`$3`

- 3D-Speaker (~26MB)

`$3`

- GTCRN (~524KB)

`Usage`

`typescript import STTManager from 'react-native-sherpa-onnx-offline-stt'; import type { STTResult, VADEvent, SpeakerEvent } from 'react-native-sherpa-onnx-offline-stt';

// Create manager instance const sttManager = new STTManager();

// Optional features diarizationModelPath: '/path/to/speaker-model.onnx', diarizationThreshold: 0.55, denoiserModelPath: '/path/to/gtcrn_simple.onnx', });

// Start recording await sttManager.startRecording();

// Stop recording const results = await sttManager.stopRecording();

// Clean up await sttManager.deinitialize();`

`API Reference`

`$3`

`typescript const manager = new STTManager();`

#### Properties

| Property | Type | Description | |----------|------|-------------| |initialized | boolean| Whether the engine is initialized | |recording | boolean | Whether currently recording |

#### Methods

#### Static Methods

`$3`

`typescript interface STTConfig { // Required modelPath: string; tokensPath: string; vadModelPath: string;

// STT mode modelType?: 'streaming' | 'offline'; // Default: 'streaming'

// Audio sampleRate?: number; // Default: 16000

// Speaker diarization (optional) diarizationModelPath?: string; diarizationThreshold?: number; // Default: 0.45 diarizationMinSpeechMs?: number; // Default: 800

// Denoiser (optional) denoiserModelPath?: string;

// ONNX provider provider?: 'cpu' | 'nnapi' | 'gpu' | 'coreml'; // Default: 'cpu' }`

`$3`

#### transcript

`typescript interface STTResult { text: string; isFinal: boolean; startTime: number; endTime: number;

// Speaker info speakerId?: number; speakerStatus?: 'pending' | 'confirmed'; }`

#### streaming

Two-tier transcript state for smoother UX:

#### vad

`typescript interface VADEvent { state: 'silence' | 'speech_start' | 'speech' | 'speech_end'; speechProbability: number; speechDurationMs: number; silenceDurationMs: number; }`

#### speaker

`typescript interface SpeakerEvent { speakerId: number; status: 'pending' | 'confirmed'; justConfirmed: boolean; totalSpeakers: number; }`

#### error

`typescript interface STTError { code: string; message: string; }`

`$3`

`Streaming vs Offline Mode`

`Background Recording`

To record when the app is in background:

`typescript // Before starting recording await sttManager.startBackgroundService(); await sttManager.startRecording();

// When done await sttManager.stopRecording(); await sttManager.stopBackgroundService();`

This shows a notification while recording in background.

`Provider Detection`

Check available ONNX providers on the device:

Platform Support

| Platform | Status |
|----------|--------|
| Android | Full support |
| iOS | Full support |

License

MIT

Credits

- sherpa-onnx - Speech recognition engine
- TEN-VAD - Voice activity detection
- GTCRN - Speech enhancement