ONNX Runtime backend for RunAnywhere React Native SDK - Speech-to-Text, Text-to-Speech, and VAD
npm install @runanywhere/onnxONNX Runtime backend for the RunAnywhere React Native SDK. Provides on-device Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) powered by Sherpa-ONNX.
---
@runanywhere/onnx provides the ONNX Runtime backend for on-device voice AI capabilities:
---
- @runanywhere/core (peer dependency)
- React Native 0.71+
- iOS 15.1+ / Android API 24+
- Microphone permission (for live recording)
---
``bash`
npm install @runanywhere/core @runanywhere/onnxor
yarn add @runanywhere/core @runanywhere/onnx
`bash`
cd ios && pod install && cd ..
Add microphone permission to Info.plist:
`xml`
Add microphone permission to AndroidManifest.xml:
`xml`
---
`typescript
import { RunAnywhere, SDKEnvironment, ModelCategory } from '@runanywhere/core';
import { ONNX, ModelArtifactType } from '@runanywhere/onnx';
// 1. Initialize SDK
await RunAnywhere.initialize({
environment: SDKEnvironment.Development,
});
// 2. Register ONNX backend
ONNX.register();
// 3. Add STT model (Whisper)
await ONNX.addModel({
id: 'sherpa-onnx-whisper-tiny.en',
name: 'Sherpa Whisper Tiny (English)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.SpeechRecognition,
artifactType: ModelArtifactType.TarGzArchive,
memoryRequirement: 75_000_000,
});
// 4. Add TTS model (Piper)
await ONNX.addModel({
id: 'vits-piper-en_US-lessac-medium',
name: 'Piper TTS (US English)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',
modality: ModelCategory.SpeechSynthesis,
artifactType: ModelArtifactType.TarGzArchive,
memoryRequirement: 65_000_000,
});
// 5. Download models
await RunAnywhere.downloadModel('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.downloadModel('vits-piper-en_US-lessac-medium');
// 6. Load STT model
const sttModel = await RunAnywhere.getModelInfo('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.loadSTTModel(sttModel.localPath, 'whisper');
// 7. Transcribe audio
const result = await RunAnywhere.transcribeFile(audioFilePath, {
language: 'en',
});
console.log('Transcription:', result.text);
// 8. Load TTS model
const ttsModel = await RunAnywhere.getModelInfo('vits-piper-en_US-lessac-medium');
await RunAnywhere.loadTTSModel(ttsModel.localPath, 'piper');
// 9. Synthesize speech
const audio = await RunAnywhere.synthesize('Hello world.', {
rate: 1.0,
pitch: 1.0,
});
console.log('Audio duration:', audio.duration, 'seconds');
`
---
`typescript`
import { ONNX, ModelArtifactType } from '@runanywhere/onnx';
#### ONNX.register()
Register the ONNX backend with the SDK. Must be called before using STT/TTS features.
`typescript`
ONNX.register(): void
Example:
`typescript`
await RunAnywhere.initialize({ ... });
ONNX.register(); // Now STT/TTS features are available
---
#### ONNX.addModel(options)
Add an ONNX model (STT or TTS) to the model registry.
`typescript`
await ONNX.addModel(options: ONNXModelOptions): Promise
Parameters:
`typescript
interface ONNXModelOptions {
/**
* Unique model ID.
* If not provided, generated from the URL filename.
*/
id?: string;
/* Display name for the model /
name: string;
/* Download URL for the model /
url: string;
/**
* Model category.
* Required: ModelCategory.SpeechRecognition or ModelCategory.SpeechSynthesis
*/
modality: ModelCategory;
/**
* How the model is packaged.
* If not provided, inferred from URL extension.
*/
artifactType?: ModelArtifactType;
/* Memory requirement in bytes /
memoryRequirement?: number;
}
enum ModelArtifactType {
SingleFile = 'singleFile', // Single .onnx file
TarGzArchive = 'tarGzArchive', // .tar.gz archive
TarBz2Archive = 'tarBz2Archive', // .tar.bz2 archive
ZipArchive = 'zipArchive', // .zip archive
}
`
Returns: Promise — The registered model info
Example:
`typescript
// STT Model (Whisper)
await ONNX.addModel({
id: 'sherpa-onnx-whisper-tiny.en',
name: 'Sherpa Whisper Tiny (English)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.SpeechRecognition,
artifactType: ModelArtifactType.TarGzArchive,
memoryRequirement: 75_000_000,
});
// Larger STT Model
await ONNX.addModel({
id: 'sherpa-onnx-whisper-small.en',
name: 'Sherpa Whisper Small (English)',
url: 'https://github.com/k2-fsa/sherpa-onnx/releases/.../sherpa-onnx-whisper-small.en.tar.bz2',
modality: ModelCategory.SpeechRecognition,
artifactType: ModelArtifactType.TarBz2Archive,
memoryRequirement: 250_000_000,
});
// TTS Model (Piper)
await ONNX.addModel({
id: 'vits-piper-en_US-lessac-medium',
name: 'Piper TTS (US English - Lessac)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../vits-piper-en_US-lessac-medium.tar.gz',
modality: ModelCategory.SpeechSynthesis,
artifactType: ModelArtifactType.TarGzArchive,
memoryRequirement: 65_000_000,
});
`
---
#### Module Properties
`typescript`
ONNX.moduleId // 'onnx'
ONNX.moduleName // 'ONNX Runtime'
ONNX.inferenceFramework // LLMFramework.ONNX
ONNX.capabilities // ['stt', 'tts']
ONNX.defaultPriority // 100
---
Use the RunAnywhere API for STT operations:
#### Load STT Model
`typescript`
await RunAnywhere.loadSTTModel(
modelPath: string,
modelType?: string // 'whisper' (default)
): Promise
#### Check Model Status
`typescript`
const isLoaded = await RunAnywhere.isSTTModelLoaded(): Promise
#### Unload Model
`typescript`
await RunAnywhere.unloadSTTModel(): Promise
#### Transcribe Audio File
`typescript`
const result = await RunAnywhere.transcribeFile(
audioPath: string,
options?: STTOptions
): Promise
STT Options:
`typescript`
interface STTOptions {
language?: string; // e.g., 'en', 'es', 'fr'
punctuation?: boolean; // Enable punctuation
diarization?: boolean; // Enable speaker diarization
wordTimestamps?: boolean; // Enable word-level timestamps
sampleRate?: number; // Audio sample rate
}
STT Result:
`typescript
interface STTResult {
text: string; // Full transcription
segments: STTSegment[]; // Segments with timing
language?: string; // Detected language
confidence: number; // Overall confidence (0-1)
duration: number; // Audio duration in seconds
alternatives: STTAlternative[];
}
interface STTSegment {
text: string;
startTime: number; // seconds
endTime: number; // seconds
confidence: number;
}
`
#### Transcribe Raw Audio
`typescript
// From base64-encoded audio
const result = await RunAnywhere.transcribe(
audioData: string, // base64 float32 PCM
options?: STTOptions
): Promise
// From float32 samples
const result = await RunAnywhere.transcribeBuffer(
samples: number[],
sampleRate: number,
options?: STTOptions
): Promise
`
---
Use the RunAnywhere API for TTS operations:
#### Load TTS Model
`typescript`
await RunAnywhere.loadTTSModel(
modelPath: string,
modelType?: string // 'piper' (default)
): Promise
#### Check Model Status
`typescript`
const isLoaded = await RunAnywhere.isTTSModelLoaded(): Promise
#### Unload Model
`typescript`
await RunAnywhere.unloadTTSModel(): Promise
#### Synthesize Speech
`typescript`
const result = await RunAnywhere.synthesize(
text: string,
options?: TTSConfiguration
): Promise
TTS Configuration:
`typescript`
interface TTSConfiguration {
voice?: string; // Voice identifier
rate?: number; // Speed (0.5-2.0, default: 1.0)
pitch?: number; // Pitch (0.5-2.0, default: 1.0)
volume?: number; // Volume (0.0-1.0, default: 1.0)
}
TTS Result:
`typescript`
interface TTSResult {
audio: string; // Base64-encoded float32 PCM
sampleRate: number; // Audio sample rate (typically 22050)
numSamples: number; // Total sample count
duration: number; // Duration in seconds
}
#### Streaming Synthesis
`typescript`
await RunAnywhere.synthesizeStream(
text: string,
options?: TTSConfiguration,
onChunk?: (chunk: TTSOutput) => void
): Promise
#### System TTS (Platform Native)
`typescript
// Speak using AVSpeechSynthesizer (iOS) or Android TTS
await RunAnywhere.speak(text: string, options?: TTSConfiguration): Promise
// Control playback
const isSpeaking = await RunAnywhere.isSpeaking(): Promise
await RunAnywhere.stopSpeaking(): Promise
// List available voices
const voices = await RunAnywhere.availableTTSVoices(): Promise
`
---
#### Initialize VAD
`typescript`
await RunAnywhere.initializeVAD(config?: VADConfiguration): Promise
VAD Configuration:
`typescript`
interface VADConfiguration {
energyThreshold?: number; // Speech detection threshold
sampleRate?: number; // Audio sample rate
frameLength?: number; // Frame length in ms
autoCalibration?: boolean; // Enable auto-calibration
}
#### Load VAD Model
`typescript`
await RunAnywhere.loadVADModel(modelPath: string): Promise
#### Process Audio
`typescript`
const result = await RunAnywhere.processVAD(
audioSamples: number[]
): Promise
VAD Result:
`typescript`
interface VADResult {
isSpeech: boolean; // Whether speech is detected
confidence: number; // Confidence score (0-1)
startTime?: number; // Speech segment start
endTime?: number; // Speech segment end
}
#### Continuous VAD
`typescript
// Start/stop continuous processing
await RunAnywhere.startVAD(): Promise
await RunAnywhere.stopVAD(): Promise
// Set callbacks
RunAnywhere.setVADSpeechActivityCallback((event) => {
if (event.type === 'speechStarted') {
console.log('Speech started');
} else if (event.type === 'speechEnded') {
console.log('Speech ended');
}
});
`
---
| Model | Size | Memory | Languages | Description |
|-------|------|--------|-----------|-------------|
| whisper-tiny.en | ~75MB | 100MB | English | Fastest, English-only |
| whisper-base.en | ~150MB | 200MB | English | Better accuracy |
| whisper-small.en | ~250MB | 350MB | English | High quality |
| whisper-tiny | ~75MB | 100MB | 99+ | Multilingual |
| Voice | Size | Language | Description |
|-------|------|----------|-------------|
| en_US-lessac-medium | ~65MB | English (US) | Natural, clear |
| en_US-amy-medium | ~65MB | English (US) | Female voice |
| en_GB-alba-medium | ~65MB | English (UK) | British accent |
| de_DE-thorsten-medium | ~65MB | German | German voice |
| es_ES-mls-medium | ~65MB | Spanish | Spanish voice |
| fr_FR-siwis-medium | ~65MB | French | French voice |
| Model | Size | Description |
|-------|------|-------------|
| silero-vad | ~2MB | High accuracy, real-time |
---
`typescript
import { RunAnywhere, SDKEnvironment, ModelCategory } from '@runanywhere/core';
import { ONNX, ModelArtifactType } from '@runanywhere/onnx';
async function transcribeAudio(audioPath: string): Promise
// Initialize
await RunAnywhere.initialize({ environment: SDKEnvironment.Development });
ONNX.register();
// Add model
await ONNX.addModel({
id: 'whisper-tiny-en',
name: 'Whisper Tiny English',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.SpeechRecognition,
artifactType: ModelArtifactType.TarGzArchive,
});
// Download if needed
if (!(await RunAnywhere.isModelDownloaded('whisper-tiny-en'))) {
await RunAnywhere.downloadModel('whisper-tiny-en', (p) => {
console.log(Download: ${(p.progress * 100).toFixed(1)}%);
});
}
// Load and transcribe
const model = await RunAnywhere.getModelInfo('whisper-tiny-en');
await RunAnywhere.loadSTTModel(model.localPath, 'whisper');
const result = await RunAnywhere.transcribeFile(audioPath, {
language: 'en',
wordTimestamps: true,
});
console.log('Transcription:', result.text);
console.log('Confidence:', result.confidence);
console.log('Duration:', result.duration, 'seconds');
return result.text;
}
`
`typescript
async function synthesizeSpeech(text: string): Promise
// Initialize
await RunAnywhere.initialize({ environment: SDKEnvironment.Development });
ONNX.register();
// Add model
await ONNX.addModel({
id: 'piper-lessac',
name: 'Piper Lessac Voice',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/.../vits-piper-en_US-lessac-medium.tar.gz',
modality: ModelCategory.SpeechSynthesis,
artifactType: ModelArtifactType.TarGzArchive,
});
// Download if needed
if (!(await RunAnywhere.isModelDownloaded('piper-lessac'))) {
await RunAnywhere.downloadModel('piper-lessac');
}
// Load and synthesize
const model = await RunAnywhere.getModelInfo('piper-lessac');
await RunAnywhere.loadTTSModel(model.localPath, 'piper');
const result = await RunAnywhere.synthesize(text, {
rate: 1.0,
pitch: 1.0,
volume: 0.8,
});
console.log('Duration:', result.duration, 'seconds');
console.log('Sample rate:', result.sampleRate);
// result.audio is base64-encoded float32 PCM
return result.audio;
}
`
`typescript
async function voiceEcho(audioPath: string): Promise
// Transcribe input audio
const transcription = await RunAnywhere.transcribeFile(audioPath);
console.log('You said:', transcription.text);
// Synthesize echo
const audio = await RunAnywhere.synthesize(
You said: ${transcription.text}
);
return audio.audio;
}
`
---
This package uses RABackendONNX.xcframework which includes:
- ONNX Runtime compiled for iOS
- Sherpa-ONNX (Whisper, Piper, Silero VAD)
- Optimized for Apple Silicon
Dependencies:
- onnxruntime.xcframework — ONNX Runtime core
Native libraries include:
- librunanywhere_onnx.so — ONNX backendlibonnxruntime.so
- — ONNX Runtimelibsherpa-onnx-*.so
- — Sherpa-ONNX libraries
---
``
packages/onnx/
├── src/
│ ├── index.ts # Package exports
│ ├── ONNX.ts # Module API (register, addModel)
│ ├── ONNXProvider.ts # Service provider
│ ├── native/
│ │ └── NativeRunAnywhereONNX.ts
│ └── specs/
│ └── RunAnywhereONNX.nitro.ts
├── cpp/
│ ├── HybridRunAnywhereONNX.cpp
│ ├── HybridRunAnywhereONNX.hpp
│ └── bridges/
├── ios/
│ ├── RunAnywhereONNX.podspec
│ └── Frameworks/
│ ├── RABackendONNX.xcframework
│ └── onnxruntime.xcframework
├── android/
│ ├── build.gradle
│ └── src/main/jniLibs/
│ └── arm64-v8a/
│ ├── librunanywhere_onnx.so
│ ├── libonnxruntime.so
│ └── libsherpa-onnx-*.so
└── nitrogen/
└── generated/
---
Symptoms: modelLoadFailed` error when loading STT model
Solutions:
1. Verify the model directory contains all required files
2. Check that archive extraction completed successfully
3. Ensure the correct model type is specified ('whisper')
Symptoms: Transcription has many errors
Solutions:
1. Use a larger model (small instead of tiny)
2. Ensure audio is clear with minimal background noise
3. Check audio sample rate matches model expectations
4. Try specifying the language explicitly
Symptoms: No sound or garbled audio
Solutions:
1. Verify audio data is being decoded correctly
2. Check sample rate matches playback device
3. Ensure volume is not zero
4. Try a different TTS voice
Symptoms: Audio recording fails
Solutions:
1. Add microphone permission to Info.plist (iOS)
2. Add RECORD_AUDIO permission to AndroidManifest.xml
3. Request runtime permission before recording
---
- Main SDK README — Full SDK documentation
- API Reference — Complete API docs
- @runanywhere/core — Core SDK
- @runanywhere/llamacpp — LLM backend
- Sherpa-ONNX — Underlying engine
- ONNX Runtime — ONNX inference engine
---
MIT License