React Native wrapper for sherpa-onnx offline speech-to-text with TEN-VAD and speaker diarization
npm install react-native-sherpa-onnx-offline-sttA React Native library for offline speech-to-text using sherpa-onnx. Runs entirely on-device with no internet connection required.
- Offline STT - Speech recognition runs locally on the device
- Two modes: Streaming (real-time) and Offline (VAD-triggered batch processing)
- TEN-VAD - Voice Activity Detection for accurate speech segmentation
- Speaker Diarization - Identify different speakers in conversation
- Speech Denoising - GTCRN-based noise reduction
- Background Recording - Continue recording when app is minimized
- Performance Metrics - RTFx, processing time, confidence scores
- Streaming State - Two-tier volatile/confirmed transcript updates
``bash`
npm install react-native-sherpa-onnx-offline-sttor
yarn add react-native-sherpa-onnx-offline-stt
Add to your android/app/build.gradle:
`gradle
android {
packagingOptions {
pickFirst '*/.so'
}
}
dependencies {
implementation 'com.k2fsa.sherpa:sherpa-onnx:1.10.+'
}
`
`bash`
cd ios && pod install
You need to download the models separately and place them on the device:
- Streaming: Zipformer French (~128MB)
- Offline: Parakeet TDT v3 (~670MB)
- TEN-VAD - Included in the library
- 3D-Speaker (~26MB)
- GTCRN (~524KB)
`typescript
import STTManager from 'react-native-sherpa-onnx-offline-stt';
import type { STTResult, VADEvent, SpeakerEvent } from 'react-native-sherpa-onnx-offline-stt';
// Create manager instance
const sttManager = new STTManager();
// Initialize with configuration
await sttManager.initialize({
modelPath: '/path/to/stt-model',
tokensPath: '/path/to/tokens.txt',
modelType: 'offline', // or 'streaming'
vadModelPath: '/path/to/vad-model',
sampleRate: 16000,
// Structured VAD configuration
vad: {
threshold: 0.5,
minSpeechDurationMs: 300,
minSilenceDurationMs: 500,
maxSpeechDurationMs: 30000, // Force segment break after 30s
speechPaddingMs: 100,
mode: 'normal', // 'aggressive' | 'normal' | 'sensitive'
},
// Optional features
diarizationModelPath: '/path/to/speaker-model.onnx',
diarizationThreshold: 0.55,
denoiserModelPath: '/path/to/gtcrn_simple.onnx',
});
// Subscribe to events using chainable API
sttManager
.on('transcript', (result: STTResult) => {
console.log([Speaker ${result.speakerId}]: ${result.text});RTFx: ${result.rtfx}, Processing: ${result.processingTime}s
console.log();VAD: ${event.state}
})
.on('streaming', (update) => {
// Two-tier streaming state
console.log('Confirmed:', update.confirmed); // Stable text
console.log('Volatile:', update.volatile); // May change
})
.on('vad', (event: VADEvent) => {
console.log();Speaker ${event.speakerId} (${event.status})
})
.on('speaker', (event: SpeakerEvent) => {
console.log();Error: ${error.code} - ${error.message}
})
.on('error', (error) => {
console.error();
});
// Start recording
await sttManager.startRecording();
// Stop recording
const results = await sttManager.stopRecording();
// Clean up
await sttManager.deinitialize();
`
`typescript`
const manager = new STTManager();
#### Properties
| Property | Type | Description |
|----------|------|-------------|
| initialized | boolean | Whether the engine is initialized |recording
| | boolean | Whether currently recording |
#### Methods
| Method | Returns | Description |
|--------|---------|-------------|
| initialize(config) | Promise | Initialize STT engine |startRecording()
| | Promise | Start microphone recording |stopRecording()
| | Promise | Stop and get final results |recognizeFile(path)
| | Promise | Transcribe audio file |isRecordingAsync()
| | Promise | Check recording status |getModelType()
| | Promise | Get current mode |getSpeakerCount()
| | Promise | Get detected speakers |resetSpeakers()
| | Promise | Clear speaker profiles |setDenoiserEnabled(bool)
| | Promise | Toggle denoiser |isDenoiserEnabled()
| | Promise | Check denoiser status |startBackgroundService()
| | Promise | Enable background recording |stopBackgroundService()
| | Promise | Disable background recording |deinitialize()
| | Promise | Clean up resources |on(event, callback)
| | this | Subscribe to events (chainable) |off(event)
| | this | Unsubscribe from events (chainable) |
#### Static Methods
| Method | Returns | Description |
|--------|---------|-------------|
| STTManager.getAvailableProviders() | Promise | Get available ONNX providers |STTManager.platform
| | string | Current platform ('ios' or 'android') |
`typescript
interface STTConfig {
// Required
modelPath: string;
tokensPath: string;
vadModelPath: string;
// STT mode
modelType?: 'streaming' | 'offline'; // Default: 'streaming'
// VAD configuration
vad: {
threshold: number; // 0.5 - Speech detection sensitivity
minSpeechDurationMs: number; // 300 - Min speech to trigger
minSilenceDurationMs: number; // 500 - Silence to end segment
maxSpeechDurationMs: number; // 30000 - Force break long speech
speechPaddingMs: number; // 100 - Padding around segments
mode: 'aggressive' | 'normal' | 'sensitive';
};
// Audio
sampleRate?: number; // Default: 16000
// Speaker diarization (optional)
diarizationModelPath?: string;
diarizationThreshold?: number; // Default: 0.45
diarizationMinSpeechMs?: number; // Default: 800
// Denoiser (optional)
denoiserModelPath?: string;
// ONNX provider
provider?: 'cpu' | 'nnapi' | 'gpu' | 'coreml'; // Default: 'cpu'
}
`
#### transcript
`typescript
interface STTResult {
text: string;
isFinal: boolean;
startTime: number;
endTime: number;
// Performance metrics
confidence: number; // 0-1 recognition confidence
processingTime: number; // Seconds to process
audioDuration: number; // Audio length in seconds
rtfx: number; // Real-time factor (>1 = faster than real-time)
// Speaker info
speakerId?: number;
speakerStatus?: 'pending' | 'confirmed';
}
`
#### streaming
Two-tier transcript state for smoother UX:
`typescript`
interface StreamingTranscriptUpdate {
volatile: string; // Current hypothesis (may change)
confirmed: string; // Stable text (won't change)
fullText: string; // confirmed + volatile
isFinal: boolean;
confidence: number;
processingTime: number;
rtfx: number;
}
#### vad
`typescript`
interface VADEvent {
state: 'silence' | 'speech_start' | 'speech' | 'speech_end';
speechProbability: number;
speechDurationMs: number;
silenceDurationMs: number;
}
#### speaker
`typescript`
interface SpeakerEvent {
speakerId: number;
status: 'pending' | 'confirmed';
justConfirmed: boolean;
totalSpeakers: number;
}
#### error
`typescript`
interface STTError {
code: string;
message: string;
}
| Mode | Description | Use Case |
|------|-------------|----------|
| aggressive | Less sensitive, fewer false positives | Noisy environments |normal
| | Balanced sensitivity | General use |sensitive
| | More sensitive, catches quieter speech | Quiet environments |
| Feature | Streaming | Offline |
|---------|-----------|---------|
| Latency | Real-time partial results | Results after speech ends |
| Accuracy | Good | Better |
| Use case | Live captions | Meeting transcription |
| Models | Zipformer | Parakeet, Whisper |
To record when the app is in background:
`typescript
// Before starting recording
await sttManager.startBackgroundService();
await sttManager.startRecording();
// When done
await sttManager.stopRecording();
await sttManager.stopBackgroundService();
`
This shows a notification while recording in background.
Check available ONNX providers on the device:
`typescriptDevice: ${info.manufacturer} ${info.device}
const info = await STTManager.getAvailableProviders();
console.log();Recommended: ${info.recommended}
console.log();``
console.log('Available:', info.providers.filter(p => p.available).map(p => p.name));
| Platform | Status |
|----------|--------|
| Android | Full support |
| iOS | Full support |
MIT
- sherpa-onnx - Speech recognition engine
- TEN-VAD - Voice activity detection
- GTCRN - Speech enhancement