Tauri Plugin STT (Speech-to-Text)

Cross-platform speech recognition plugin for Tauri 2.x applications. Provides real-time speech-to-text functionality for desktop (Windows, macOS, Linux) and mobile (iOS, Android).

Features

- 🎤 Real-time Speech Recognition - Convert speech to text with low latency
- 📱 Cross-platform Support - iOS, Android, macOS, Windows, Linux
- 🌐 Multi-language Support - 9 languages with automatic model download
- 📝 Interim Results - Get partial transcriptions while speaking
- 🔄 Continuous Mode - Auto-restart recognition after each utterance
- 🔐 Permission Handling - Request and check microphone/speech permissions
- 📥 Auto Model Download - Vosk models are downloaded automatically on first use

Platform Support

| Platform | Status | API Used | Model Download |
| -------- | ------- | ------------------------------------- | -------------- |
| iOS | ✅ Full | SFSpeechRecognizer (Speech framework) | Not required |
| Android | ✅ Full | SpeechRecognizer API | Not required |
| macOS | ✅ Full | Vosk (offline speech recognition) | Automatic |
| Windows | ✅ Full | Vosk (offline speech recognition) | Automatic |
| Linux | ✅ Full | Vosk (offline speech recognition) | Automatic |

Supported Languages (Desktop)

| Language | Code | Model Size |
| ---------- | ----- | ---------- |
| English | en-US | 40 MB |
| Portuguese | pt-BR | 31 MB |
| Spanish | es-ES | 39 MB |
| French | fr-FR | 41 MB |
| German | de-DE | 45 MB |
| Russian | ru-RU | 45 MB |
| Chinese | zh-CN | 43 MB |
| Japanese | ja-JP | 48 MB |
| Italian | it-IT | 39 MB |

Models are downloaded automatically from alphacephei.com/vosk/models when you first use a language.

Installation

$3

Add the plugin to your Cargo.toml:

``toml [dependencies] tauri-plugin-stt = "0.1"`

`$3`

Install the JavaScript guest bindings:

`bash npm install tauri-plugin-stt-api

`or`


yarn add tauri-plugin-stt-api
or

pnpm add tauri-plugin-stt-api


Setup
$3
In your Tauri app setup:

`rust fn main() { tauri::Builder::default() .plugin(tauri_plugin_stt::init()) .run(tauri::generate_context!()) .expect("error while running application"); }`

`$3`

Add permissions to your capabilities/default.json:

`json { "permissions": ["stt:default"] }`

For granular permissions, you can specify individual commands:

`json { "permissions": [ "stt:allow-is-available", "stt:allow-get-supported-languages", "stt:allow-check-permission", "stt:allow-request-permission", "stt:allow-start-listening", "stt:allow-stop-listening", "stt:allow-register-listener", "stt:allow-remove-listener" ] }`

For granular permissions, you can specify individual commands:

`$3`

The Vosk runtime library must be installed on your system:

#### macOS

`bash

`Download and install libvosk`


curl -LO https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-osx-0.3.42.zip
unzip vosk-osx-0.3.42.zip
sudo cp vosk-osx-0.3.42/libvosk.dylib /usr/local/lib/


#### Linux

`bash wget https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-linux-x86_64-0.3.42.zip unzip vosk-linux-x86_64-0.3.42.zip sudo cp vosk-linux-x86_64-0.3.42/libvosk.so /usr/local/lib/ sudo ldconfig`

#### Windows

Download from GitHub Releases and add to PATH.

`Usage`

`$3`

`typescript import { isAvailable, getSupportedLanguages, startListening, stopListening, onResult, onStateChange, onError, } from "tauri-plugin-stt-api";

// Check if STT is available const result = await isAvailable();

// Get supported languages (with installed status) const languages = await getSupportedLanguages();

// Listen for results const resultListener = await onResult(result => { console.log("Recognized:", result.transcript, result.isFinal); });

// Listen for download progress (when model is being downloaded) import { listen } from "@tauri-apps/api/event"; const downloadListener = await listen<{ status: string; model: string; progress: number; }>("stt://download-progress", event => { console.log(${event.payload.status}: ${event.payload.progress}%); });

// Start listening await startListening({ language: "en-US", interimResults: true, continuous: true, // maxDuration and onDevice are supported by the guest SDK });

// Stop listening await stopListening();`

`$3`

`typescript interface ListenConfig { language?: string; // Language code (e.g., "en-US", "pt-BR") interimResults?: boolean; // Return partial results while speaking continuous?: boolean; // Continue listening after utterance ends maxDuration?: number; // Max listening duration in milliseconds (0 = unlimited) onDevice?: boolean; // Prefer on-device recognition (iOS) }`

`$3`

`typescript // Listen for results const unlistenResult = await onResult(result => { console.log(result.transcript, result.isFinal); });

// Listen for state changes const unlistenState = await onStateChange(event => { console.log("State:", event.state); // "idle" | "listening" | "processing" });

// Listen for errors const unlistenError = await onError(error => { console.error([${error.code}] ${error.message}); });

// Clean up listeners unlistenResult(); unlistenState(); unlistenError();`

`Events`

| Event | Payload | Description | | ------------------------- | -------------------------------------- | ---------------------------------------- | |stt://result | { transcript, isFinal, confidence? }| Recognition result | |stt://state-change | { state }| State change (idle/listening/processing) | |stt://error | { code, message, details? }| Error event | |stt://download-progress | { status, model, progress } | Model download progress |

`API Reference`

`$3`

Start speech recognition.

Config Options:

- language: Language code (e.g., "en-US", "pt-BR") -interimResults: Return partial results (default: false) -continuous: Continue listening after utterance ends (default: false) -maxDuration: Max listening duration in ms (0 = unlimited) -onDevice: Use on-device recognition (iOS only, default: false)

`$3`

Stop current speech recognition session.

`$3`

Check if STT is available on the device.

Returns:

- available: Whether STT is available -reason: Optional reason if unavailable

`$3`

Get list of supported languages.

Returns: Array of languages with:

- code: Language code (e.g., "en-US") -name: Display name -installed: Whether model is installed (desktop only)

`$3`

Check current permission status.

Returns:

- microphone: "granted" | "denied" | "unknown" -speechRecognition: "granted" | "denied" | "unknown"

`$3`

Request microphone and speech recognition permissions.

Returns: Same as checkPermission()

`$3`

Listen for recognition results.

Result:

- transcript: Recognized text -isFinal: Whether this is a final result -confidence: Confidence score (0.0-1.0, if available)

`$3`

Listen for state changes.

States: "idle", "listening", "processing"

`$3`

Listen for errors.

Error Codes:

- NOT_AVAILABLE: STT not available on device -PERMISSION_DENIED: Microphone permission denied -SPEECH_PERMISSION_DENIED: Speech recognition permission denied -NETWORK_ERROR: Network error (server-based recognition) -AUDIO_ERROR: Audio capture error -TIMEOUT: Recognition timeout -NO_SPEECH: No speech detected -LANGUAGE_NOT_SUPPORTED: Requested language not supported -CANCELLED: Recognition cancelled by user -ALREADY_LISTENING: Already in listening state -NOT_LISTENING: Not currently listening -BUSY: Recognizer busy -UNKNOWN: Unknown error

`Building`

`$3`

`bash npm run dev`

`$3`

`bash npm run dev -- --features stt

`or`


npm run dev:stt


Troubleshooting
$3
Solution: Install the Vosk library as described in the Vosk Library section.

`bash

`macOS`


ls /usr/local/lib/libvosk.dylib  # Should exist
Linux

ldconfig -p | grep vosk  # Should show libvosk.so
Windows

where vosk.dll  # Should be in PATH


$3
Problem: Vosk models are downloaded automatically on first use for each language.
Solution:

1. Ensure internet connectivity 2. Check app data directory:~/.local/share/tauri-plugin-stt/models/ (Linux/macOS) or %APPDATA%/tauri-plugin-stt/models/(Windows) 3. Manual download: Download from alphacephei.com/vosk/models and extract to models directory 4. Model naming: Ensure folder name matches expected pattern (e.g.,vosk-model-small-en-us-0.15)

`$3`

iOS Solution:

1. Ensure iOS 10+ (speech recognition requires iOS 10+) 2. Check Settings → Privacy → Speech Recognition → Enable for your app 3. For on-device recognition, iOS 13+ is required

Android Solution:

1. Install Google app (provides speech recognition service) 2. Check Settings → Apps → Default apps → Digital assistant app 3. Ensure internet connectivity for server-based recognition

`$3`

Solution: Call requestPermission() before startListening()

`typescript const perm = await requestPermission(); if (perm.microphone !== "granted") { console.error("Microphone permission required"); return; } await startListening();`

`$3`

Checklist:

- ✅ Microphone is working in other apps - ✅ Correct microphone selected in system settings - ✅ Microphone not muted (hardware or software) - ✅ App has microphone permission - ✅ No other app is using the microphone exclusively

`$3`

Note: Interim results availability varies by platform:

- iOS/Android: Full support - Desktop (Vosk): Partial support (depends on model)

`typescript await startListening({ interimResults: true, // Enable interim results continuous: true, // Keep listening });`

`$3`

Tips:

- Use correct language code for your accent (e.g., "en-GB" vs "en-US") - Speak clearly and avoid background noise - On iOS, download enhanced voices in Settings → Accessibility → Spoken Content - Desktop: Use larger Vosk models for better accuracy (at cost of size)

`$3`

Solution: Stop current session before starting a new one:

`typescript try { await stopListening(); } catch (e) { // Ignore if not listening } await startListening();`

`$3`

Note: Download progress events are only for desktop (Vosk models). Mobile uses native speech recognition without downloads.

`typescript import { listen } from "@tauri-apps/api/event";

const unlisten = await listen("stt://download-progress", event => { console.log(${event.payload.status}: ${event.payload.progress}%); });`

`Examples`

See the examples/stt-example directory for a complete working demo with React + Material UI, featuring:

- Real-time transcription with interim results - Language selection - Permission handling - Error handling with visual feedback - Download progress monitoring - Results history

`Platform-Specific Notes`

`$3`

- Requires iOS 10+ for basic speech recognition - iOS 13+ required for on-device recognition (onDevice: true) - Must addNSSpeechRecognitionUsageDescriptionto Info.plist - Must addNSMicrophoneUsageDescription to Info.plist

`$3`

- Requires Android API 23+ (Android 6.0+) - Google app must be installed for speech recognition - Internet required for server-based recognition - Must requestRECORD_AUDIO` permission in AndroidManifest.xml

$3

- Requires Vosk library installation (see Vosk Library section)
- Models downloaded automatically (40-50 MB per language)
- Fully offline after model download
- Models stored in app data directory

License

MIT

Tauri Plugin STT (Speech-to-Text)

Cross-platform speech recognition plugin for Tauri 2.x applications. Provides real-time speech-to-text functionality for desktop (Windows, macOS, Linux) and mobile (iOS, Android).

Features

Platform Support

Supported Languages (Desktop)

Models are downloaded automatically from alphacephei.com/vosk/models when you first use a language.

Installation

$3

Add the plugin to your Cargo.toml:

``toml [dependencies] tauri-plugin-stt = "0.1"`

`$3`

Install the JavaScript guest bindings:

`bash npm install tauri-plugin-stt-api

`or`


yarn add tauri-plugin-stt-api
or

pnpm add tauri-plugin-stt-api


Setup
$3
In your Tauri app setup:

`rust fn main() { tauri::Builder::default() .plugin(tauri_plugin_stt::init()) .run(tauri::generate_context!()) .expect("error while running application"); }`

`$3`

Add permissions to your capabilities/default.json:

`json { "permissions": ["stt:default"] }`

For granular permissions, you can specify individual commands:

`$3`

The Vosk runtime library must be installed on your system:

#### macOS

`bash

`Download and install libvosk`


curl -LO https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-osx-0.3.42.zip
unzip vosk-osx-0.3.42.zip
sudo cp vosk-osx-0.3.42/libvosk.dylib /usr/local/lib/


#### Linux

#### Windows

Download from GitHub Releases and add to PATH.

`Usage`

`$3`

`typescript import { isAvailable, getSupportedLanguages, startListening, stopListening, onResult, onStateChange, onError, } from "tauri-plugin-stt-api";

// Check if STT is available const result = await isAvailable();

// Get supported languages (with installed status) const languages = await getSupportedLanguages();

// Listen for results const resultListener = await onResult(result => { console.log("Recognized:", result.transcript, result.isFinal); });

// Start listening await startListening({ language: "en-US", interimResults: true, continuous: true, // maxDuration and onDevice are supported by the guest SDK });

// Stop listening await stopListening();`

`$3`

`typescript // Listen for results const unlistenResult = await onResult(result => { console.log(result.transcript, result.isFinal); });

// Listen for state changes const unlistenState = await onStateChange(event => { console.log("State:", event.state); // "idle" | "listening" | "processing" });

// Listen for errors const unlistenError = await onError(error => { console.error([${error.code}] ${error.message}); });

// Clean up listeners unlistenResult(); unlistenState(); unlistenError();`

`Events`

`API Reference`

`$3`

Start speech recognition.

Config Options:

`$3`

Stop current speech recognition session.

`$3`

Check if STT is available on the device.

Returns:

- available: Whether STT is available -reason: Optional reason if unavailable

`$3`

Get list of supported languages.

Returns: Array of languages with:

- code: Language code (e.g., "en-US") -name: Display name -installed: Whether model is installed (desktop only)

`$3`

Check current permission status.

Returns:

- microphone: "granted" | "denied" | "unknown" -speechRecognition: "granted" | "denied" | "unknown"

`$3`

Request microphone and speech recognition permissions.

Returns: Same as checkPermission()

`$3`

Listen for recognition results.

Result:

- transcript: Recognized text -isFinal: Whether this is a final result -confidence: Confidence score (0.0-1.0, if available)

`$3`

Listen for state changes.

States: "idle", "listening", "processing"

`$3`

Listen for errors.

Error Codes:

`Building`

`$3`

`bash npm run dev`

`$3`

`bash npm run dev -- --features stt

`or`


npm run dev:stt


Troubleshooting
$3
Solution: Install the Vosk library as described in the Vosk Library section.

`bash

`macOS`


ls /usr/local/lib/libvosk.dylib  # Should exist
Linux

ldconfig -p | grep vosk  # Should show libvosk.so
Windows

where vosk.dll  # Should be in PATH


$3
Problem: Vosk models are downloaded automatically on first use for each language.
Solution:

`$3`

iOS Solution:

1. Ensure iOS 10+ (speech recognition requires iOS 10+) 2. Check Settings → Privacy → Speech Recognition → Enable for your app 3. For on-device recognition, iOS 13+ is required

Android Solution:

1. Install Google app (provides speech recognition service) 2. Check Settings → Apps → Default apps → Digital assistant app 3. Ensure internet connectivity for server-based recognition

`$3`

Solution: Call requestPermission() before startListening()

`typescript const perm = await requestPermission(); if (perm.microphone !== "granted") { console.error("Microphone permission required"); return; } await startListening();`

`$3`

Checklist:

`$3`

Note: Interim results availability varies by platform:

- iOS/Android: Full support - Desktop (Vosk): Partial support (depends on model)

`typescript await startListening({ interimResults: true, // Enable interim results continuous: true, // Keep listening });`

`$3`

Tips:

`$3`

Solution: Stop current session before starting a new one:

`typescript try { await stopListening(); } catch (e) { // Ignore if not listening } await startListening();`

`$3`

Note: Download progress events are only for desktop (Vosk models). Mobile uses native speech recognition without downloads.

`typescript import { listen } from "@tauri-apps/api/event";

const unlisten = await listen("stt://download-progress", event => { console.log(${event.payload.status}: ${event.payload.progress}%); });`

`Examples`

See the examples/stt-example directory for a complete working demo with React + Material UI, featuring:

- Real-time transcription with interim results - Language selection - Permission handling - Error handling with visual feedback - Download progress monitoring - Results history

`Platform-Specific Notes`

`$3`

$3

- Requires Vosk library installation (see Vosk Library section)
- Models downloaded automatically (40-50 MB per language)
- Fully offline after model download
- Models stored in app data directory

License

MIT