js-tts-wrapper

A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services. Inspired by py3-TTS-Wrapper, it simplifies the use of services like Azure, Google Cloud, IBM Watson, and ElevenLabs.

- Features
- Supported TTS Engines
- Installation
- Installation
- Using npm scripts
- Quick Start
- Core Functionality
- Voice Management
- Text Synthesis
- Audio Playback
- File Output
- Event Handling
- SSML Support
- Speech Markdown Support
- Engine-Specific Examples
- Browser Support
- API Reference
- Contributing
- License
- Examples and Demos

Features

- Unified API: Consistent interface across multiple TTS providers.
- SSML Support: Use Speech Synthesis Markup Language to enhance speech synthesis
- Speech Markdown: Optional support for easier speech markup
- Voice Selection: Easily browse and select from available voices
- Streaming Synthesis: Stream audio as it's being synthesized
- Playback Control: Pause, resume, and stop audio playback
- Word Boundaries: Get callbacks for word timing (where supported)
- File Output: Save synthesized speech to audio files
- Browser Support: Works in both Node.js (server) and browser environments (see engine support table below)

Supported TTS Engines

| Factory Name | Class Name | Environment | Provider | Dependencies |
|--------------|------------|-------------|----------|-------------|
| azure | AzureTTSClient | Both | Microsoft Azure Cognitive Services | @azure/cognitiveservices-speechservices, microsoft-cognitiveservices-speech-sdk |
| google | GoogleTTSClient | Both | Google Cloud Text-to-Speech | @google-cloud/text-to-speech |
| elevenlabs | ElevenLabsTTSClient | Both | ElevenLabs | node-fetch@2 (Node.js only) |
| watson | WatsonTTSClient | Both | IBM Watson | None (uses fetch API) |
| openai | OpenAITTSClient | Both | OpenAI | openai |
| upliftai | UpliftAITTSClient | Both | UpLiftAI | None (uses fetch API) |
| playht | PlayHTTTSClient | Both | PlayHT | node-fetch@2 (Node.js only) |
| polly | PollyTTSClient | Both | Amazon Web Services | @aws-sdk/client-polly |
| sherpaonnx | SherpaOnnxTTSClient | Node.js | k2-fsa/sherpa-onnx | sherpa-onnx-node, decompress, decompress-bzip2, decompress-tarbz2, decompress-targz, tar-stream |
| sherpaonnx-wasm | SherpaOnnxWasmTTSClient | Browser | k2-fsa/sherpa-onnx | None (WASM included) |
| espeak | EspeakNodeTTSClient | Node.js | eSpeak NG | text2wav |
| espeak-wasm | EspeakBrowserTTSClient | Both | eSpeak NG | mespeak (Node.js) or meSpeak.js (browser) |
| sapi | SAPITTSClient | Node.js | Windows Speech API (SAPI) | None (uses PowerShell) |
| witai | WitAITTSClient | Both | Wit.ai | None (uses fetch API) |

Factory Name: Use with createTTSClient('factory-name', credentials)
Class Name: Use with direct import import { ClassName } from 'js-tts-wrapper'
Environment: Node.js = server-side only, Browser = browser-compatible, Both = works in both environments

$3

- Importing js-tts-wrapper does NOT load sherpa-onnx-node.
- Cloud engines (Azure, Google, Polly, OpenAI, etc.) work without any SherpaONNX packages installed.
- Only when you instantiate SherpaOnnxTTSClient (Node-only) will the library look for sherpa-onnx-node and its platform package. If SherpaONNX is not installed, the Sherpa engine will gracefully warn/fallback, and other engines remain unaffected.
- See the Installation section below for how to install SherpaONNX dependencies only if you plan to use that engine.

$3

#### Word Boundary and Timing Support

| Engine | Word Boundaries | Timing Source | Character-Level | Accuracy |
|--------|----------------|---------------|-----------------|----------|
| ElevenLabs | ✅ | Real API data | ✅ NEW! | High |
| Azure | ✅ | Real API data | ❌ | High |
| Google | ✅ | Estimated | ❌ | Low |
| Watson | ✅ | Estimated | ❌ | Low |
| UpLiftAI | ✅ | Estimated | ❌ | Low |
| OpenAI | ✅ | Estimated | ❌ | Low |
| WitAI | ✅ | Estimated | ❌ | Low |
| PlayHT | ✅ | Estimated | ❌ | Low |
| Polly | ✅ | Estimated | ❌ | Low |
| eSpeak | ✅ | Estimated | ❌ | Low |
| eSpeak-WASM | ✅ | Estimated | ❌ | Low |
| SherpaOnnx | ✅ | Estimated | ❌ | Low |
| SherpaOnnx-WASM | ✅ | Estimated | ❌ | Low |
| SAPI | ✅ | Estimated | ❌ | Low |

Character-Level Timing: Only ElevenLabs provides precise character-level timing data via the /with-timestamps endpoint, enabling the most accurate word highlighting and speech synchronization.

#### Audio Format Conversion Support

| Engine | Native Format | WAV Support | MP3 Conversion | Conversion Method |
|--------|---------------|-------------|----------------|-------------------|
| All Engines | Varies | ✅ | ✅ | Pure JavaScript (lamejs) |

Format Conversion: All engines support WAV and MP3 output through automatic format conversion. The wrapper uses pure JavaScript conversion (lamejs) when FFmpeg is not available, ensuring cross-platform compatibility without external dependencies.

Installation

The library uses a modular approach where TTS engine-specific dependencies are optional. You can install the package and its dependencies as follows:

$3

``bash

`Install the base package`


npm install js-tts-wrapper
Install dependencies for specific engines

npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk  # For Azure
npm install @google-cloud/text-to-speech  # For Google Cloud
npm install @aws-sdk/client-polly  # For AWS Polly
npm install node-fetch@2  # For ElevenLabs and PlayHT
npm install openai  # For OpenAI
npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream  # For SherpaOnnx
npm install text2wav  # For eSpeak NG (Node.js)
npm install mespeak  # For eSpeak NG-WASM (Node.js)
npm install say  # For System TTS (Node.js)
npm install sound-play pcm-convert  # For Node.js audio playback


$3
After installing the base package, you can use the npm scripts provided by the package to install specific engine dependencies:

`bash

`Navigate to your project directory where js-tts-wrapper is installed`


cd your-project
Install Azure dependencies

npx js-tts-wrapper@latest run install:azure
Install SherpaOnnx dependencies

npx js-tts-wrapper@latest run install:sherpaonnx
Install eSpeak NG dependencies (Node.js)

npx js-tts-wrapper@latest run install:espeak
Install eSpeak NG-WASM dependencies (Node.js)

npx js-tts-wrapper@latest run install:espeak-wasm
Install System TTS dependencies (Node.js)

npx js-tts-wrapper@latest run install:system
Install Node.js audio playback dependencies

npx js-tts-wrapper@latest run install:node-audio
Install all development dependencies

npx js-tts-wrapper@latest run install:all-dev


Quick Start
$3
#### ESM (ECMAScript Modules)

`javascript import { AzureTTSClient } from 'js-tts-wrapper';

// Initialize the client with your credentials const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// List available voices const voices = await tts.getVoices(); console.log(voices);

// Set a voice tts.setVoice('en-US-AriaNeural');

// Speak some text await tts.speak('Hello, world!');

// Use SSML for more control const ssml = 'Hello world!'; await tts.speak(ssml);`

#### CommonJS

`javascript const { AzureTTSClient } = require('js-tts-wrapper');

// Initialize the client with your credentials const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// Use async/await within an async function async function runExample() { // List available voices const voices = await tts.getVoices(); console.log(voices);

// Set a voice tts.setVoice('en-US-AriaNeural');

// Speak some text await tts.speak('Hello, world!');

// Use SSML for more control const ssml = 'Hello world!'; await tts.speak(ssml); }

runExample().catch(console.error);`

`$3`

The library provides a factory function to create TTS clients dynamically based on the engine name:

#### ESM (ECMAScript Modules)

`javascript import { createTTSClient } from 'js-tts-wrapper';

// Create a TTS client using the factory function const tts = createTTSClient('azure', { subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// Use the client as normal await tts.speak('Hello from the factory pattern!');`

#### CommonJS

`javascript const { createTTSClient } = require('js-tts-wrapper');

// Create a TTS client using the factory function const tts = createTTSClient('azure', { subscriptionKey: 'your-subscription-key', region: 'westeurope' });

async function runExample() { // Use the client as normal await tts.speak('Hello from the factory pattern!'); }

runExample().catch(console.error);`

The factory supports all engines: 'azure', 'google', 'polly', 'elevenlabs', 'openai', 'playht', 'watson', 'witai', 'sherpaonnx', 'sherpaonnx-wasm', 'espeak', 'espeak-wasm', 'sapi', etc.

`Core Functionality`

All TTS engines in js-tts-wrapper implement a common set of methods and features through the AbstractTTSClient class. This ensures consistent behavior across different providers.

`$3`

`typescript // Get all available voices const voices = await tts.getVoices();

// Get voices for a specific language const englishVoices = await tts.getVoicesByLanguage('en-US');

// Set the voice to use tts.setVoice('en-US-AriaNeural');`

The library includes a robust Language Normalization system that standardizes language codes across different TTS engines. This allows you to:

- Use BCP-47 codes (e.g., 'en-US') or ISO 639-3 codes (e.g., 'eng') interchangeably - Get consistent language information regardless of the TTS engine - Filter voices by language using any standard format

`$3`

All TTS engines support standardized credential validation to help you verify your setup before making requests:

`typescript // Basic validation - returns boolean const isValid = await tts.checkCredentials(); if (!isValid) { console.error('Invalid credentials!'); }

// Detailed validation - returns comprehensive status const status = await tts.getCredentialStatus(); console.log(status); /* { valid: true, engine: 'openai', environment: 'node', requiresCredentials: true, credentialTypes: ['apiKey'], message: 'openai credentials are valid and 10 voices are available' } */`

Engine Requirements: - Cloud engines (OpenAI, Azure, Google, etc.): Require API keys/credentials - Local engines (eSpeak, SAPI, SherpaOnnx): No credentials needed - Environment-specific: Some engines work only in Node.js or browser

See the Credential Validation Guide for detailed requirements and troubleshooting.

`$3`

`typescript // Convert text to audio bytes (Uint8Array) const audioBytes = await tts.synthToBytes('Hello, world!');

// Stream synthesis with word boundary information const { audioStream, wordBoundaries } = await tts.synthToBytestream('Hello, world!');`

`$3`

`typescript // Traditional text synthesis and playback await tts.speak('Hello, world!');

// NEW: Play audio from different sources without re-synthesizing // Play from file await tts.speak({ filename: 'path/to/audio.mp3' });

// Play from audio bytes const audioBytes = await tts.synthToBytes('Hello, world!'); await tts.speak({ audioBytes: audioBytes });

// Play from audio stream const { audioStream } = await tts.synthToBytestream('Hello, world!'); await tts.speak({ audioStream: audioStream });

// All input types work with speakStreamed too await tts.speakStreamed({ filename: 'path/to/audio.mp3' });

// Playback control tts.pause(); // Pause playback tts.resume(); // Resume playback tts.stop(); // Stop playback

// Stream synthesis and play with word boundary callbacks await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => { console.log(Word: ${word}, Start: ${start}s, End: ${end}s); });`

#### Benefits of Multi-Source Audio Playback

- Avoid Double Synthesis: Use synthToFile() to save audio, then play the same file with speak({ filename })without re-synthesizing - Platform Independent: Works consistently across browser and Node.js environments - Efficient Reuse: Play the same audio bytes or stream multiple times without regenerating - Flexible Input: Choose the most convenient input source for your use case

> Note: Audio playback with speak() and speakStreamed() methods is supported in both browser environments and Node.js environments with the optional sound-play package installed. To enable Node.js audio playback, install the required packages with npm install sound-play pcm-convert or use the npm script npx js-tts-wrapper@latest run install:node-audio.

`$3`

`typescript // Save synthesized speech to a file await tts.synthToFile('Hello, world!', 'output', 'mp3');`

`$3`

`typescript // Register event handlers tts.on('start', () => console.log('Speech started')); tts.on('end', () => console.log('Speech ended')); tts.on('boundary', (word, start, end) => { console.log(Word: ${word}, Start: ${start}s, End: ${end}s); });

// Alternative event connection tts.connect('onStart', () => console.log('Speech started')); tts.connect('onEnd', () => console.log('Speech ended'));`

`$3`

Word boundary events provide precise timing information for speech synchronization, word highlighting, and interactive applications.

#### Basic Word Boundary Usage

`typescript // Enable word boundary events tts.on('boundary', (word, startTime, endTime) => { console.log("${word}" spoken from ${startTime}s to ${endTime}s); });

await tts.speak('Hello world, this is a test.'); // Output: // "Hello" spoken from 0.000s to 0.300s // "world," spoken from 0.300s to 0.600s // "this" spoken from 0.600s to 0.900s // ...`

#### Advanced Timing with Character-Level Precision (ElevenLabs)

`typescript // ElevenLabs: Enable character-level timing for maximum accuracy const tts = createTTSClient('elevenlabs');

// Method 1: Using synthToBytestream with timestamps const result = await tts.synthToBytestream('Hello world', { useTimestamps: true });

console.log(Generated ${result.wordBoundaries.length} word boundaries:); result.wordBoundaries.forEach(wb => { const startSec = wb.offset / 10000; const durationSec = wb.duration / 10000; console.log("${wb.text}": ${startSec}s - ${startSec + durationSec}s); });

// Method 2: Using enhanced callback support await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => { console.log(Precise timing: "${word}" from ${start}s to ${end}s); });`

#### Real-Time Word Highlighting Example

`typescript // Example: Real-time word highlighting for accessibility const textElement = document.getElementById('text'); const words = 'Hello world, this is a test.'.split(' '); let wordIndex = 0;

tts.on('boundary', (word, startTime, endTime) => { // Highlight current word if (wordIndex < words.length) { textElement.innerHTML = words.map((w, i) => i === wordIndex ?${w}: w ).join(' '); wordIndex++; } });

await tts.speak('Hello world, this is a test.', { useWordBoundary: true });`

`SSML Support`

The library provides comprehensive SSML (Speech Synthesis Markup Language) support with engine-specific capabilities:

`$3`

The following engines support SSML: - Google Cloud TTS - Full SSML support with all elements - Microsoft Azure - Full SSML support with voice-specific features - Amazon Polly - Dynamic SSML support based on voice engine type (standard/long-form: full, neural/generative: limited) - WitAI - Full SSML support - SAPI (Windows) - Full SSML support - eSpeak/eSpeak-WASM - SSML support with subset of elements

`$3`

The following engines automatically strip SSML tags and convert to plain text: - ElevenLabs - SSML tags are removed, plain text is synthesized - OpenAI - SSML tags are removed, plain text is synthesized - PlayHT - SSML tags are removed, plain text is synthesized - SherpaOnnx/SherpaOnnx-WASM - SSML tags are removed, plain text is synthesized

`$3`

`typescript // Use SSML directly (works with supported engines) const ssml =

This text will be spoken slowly with a low pitch.

This text is emphasized.

; await tts.speak(ssml);

// Or use the SSML builder const ssmlText = tts.ssml .prosody({ rate: 'slow', pitch: 'low' }, 'This text will be spoken slowly with a low pitch.') .break(500) .emphasis('strong', 'This text is emphasized.') .toString();

await tts.speak(ssmlText);`

`$3`

- Amazon Polly: SSML support varies by voice engine type: - Standard voices: Full SSML support including all tags - Long-form voices: Full SSML support including all tags - Neural voices: Limited SSML support (no emphasis, limited prosody) - Generative voices: Limited SSML support (partial tag support) - The library automatically detects voice engine types and handles SSML appropriately - Microsoft Azure: Supports voice-specific SSML elements and custom voice tags - Supports MS-specific tags likefor emotional styles - The library automatically injects the requiredxmlns:msttsnamespace when needed - Google Cloud: Supports the most comprehensive set of SSML elements - WitAI: Full SSML support according to W3C specification - SAPI: Windows-native SSML support with system voice capabilities - eSpeak: Supports SSML subset including prosody, breaks, and emphasis elements

`$3`

Speech Markdown and the built-in SSML helpers cover most use cases, but there are times when you need to send hand-crafted SSML—custom namespaces, experimental tags, or markup generated by another tool. In those cases you can use the rawSSML flag to bypass Speech Markdown conversion and SSML validation:

`typescript // Example: Azure multi-speaker dialog, currently easier to author as raw SSML const azureDialogSSML =

Welcome to the demo.

Hey there! This turn uses a different voice.

;

await tts.speak(azureDialogSSML, { rawSSML: true });

// When rawSSML=true the wrapper will: // 1. Skip Speech Markdown conversion // 2. Skip SSML validation / normalization // 3. Pass the SSML through unchanged (aside from ensuring exists)`

Important: When you opt into rawSSML, you are responsible for producing provider-compliant SSML. The wrapper only wraps the payload with if missing and adds obvious namespaces, but it does not attempt to sanitize or validate the markup.

Need to mix-and-match? Convert Speech Markdown to SSML yourself (using speechmarkdown-js directly) and then send it through rawSSML: true to avoid duplicate parsing:

`typescript import { SpeechMarkdown } from "speechmarkdown-js";

const markdown = "(Hello!)[excited:\"1.5\"] with (characters)[character:superhero]"; const ssml = await SpeechMarkdown.toSSML(markdown, { platform: "microsoft-azure" });

await tts.speak(ssml, { rawSSML: true });`

If you hit a Speech Markdown feature gap, consider contributing upstream—the library powers our conversion pipeline, so improvements there benefit every js-tts-wrapper user.

`Speech Markdown Support`

The library supports Speech Markdown for easier speech formatting across all engines. Speech Markdown is powered by the speechmarkdown-js library, which provides comprehensive platform-specific support.

`$3`

- SSML-supported engines: Speech Markdown is converted to SSML (with platform-specific optimizations), then processed natively - Non-SSML engines: Speech Markdown is converted to SSML, then SSML tags are stripped to plain text

`$3`

The speechmarkdown-js library ships dedicated formatters for every major provider:

- Microsoft Azure: Automatic mstts namespace injection, inline sections, and 27 mstts:express-as styles (excited, chat, newscaster, customerservice, etc.) with optional styledegree intensity. Section modifiers such as #[excited] are supported as long as you leave a blank line before the section and close it with #[defaults](or another section tag). - Amazon Polly: Emotional styles and neural/standard voice effects that map cleanly onto Polly’s SSML dialect. - Google Cloud: Google Assistant style tags, multi-language voices, and automatichandling. - ElevenLabs: A formatter that emits ElevenLabs’ prompt markup (, IPA phonemes, etc.), so you can feed Speech Markdown directly into ElevenLabs if you bypass the wrapper or use rawSSML. - WitAI / Microsoft SAPI / IBM Watson / W3C: Full SSML support for their respective dialects. - And more: See the speechmarkdown-js README for the complete, always up-to-date list.

`$3`

`typescript // Use Speech Markdown with any engine const markdown = "Hello [500ms] world! ++This text is emphasized++ (slowly)[rate:\"slow\"] (high)[pitch:\"high\"] (loudly)[volume:\"loud\"]"; await tts.speak(markdown, { useSpeechMarkdown: true });

// If you omit useSpeechMarkdown, the wrapper auto-enables it when Speech Markdown syntax is detected.

// Platform-specific Speech Markdown features // Azure: Section modifiers map to mstts:express-as const azureMarkdown =

#[excited]
This entire section is excited!
Multiple sentences work too.

#[defaults]
Back to the neutral voice.
; await azureTts.speak(azureMarkdown, { useSpeechMarkdown: true });

// Speech Markdown works with all engines const ttsGoogle = new TTSClient('google'); const ttsElevenLabs = new TTSClient('elevenlabs');

// Both will handle Speech Markdown appropriately await ttsGoogle.speak(markdown, { useSpeechMarkdown: true }); // Converts to SSML await ttsElevenLabs.speak(markdown, { useSpeechMarkdown: true }); // Uses the ElevenLabs formatter (tags are stripped before hitting the API)

// Need ElevenLabs prompt markup? // Convert directly and pass through rawSSML or your own API client: import { SpeechMarkdown as SMSpeechMarkdown } from "speechmarkdown-js"; const elevenLabsMarkup = await new SMSpeechMarkdown().toSSML(markdown, { platform: "elevenlabs" }); await elevenLabsClient.speak(elevenLabsMarkup, { rawSSML: true });`

`$3`

- [500ms] or [break:"500ms"]- Pauses/breaks -++text++ or +text+- Text emphasis -(text)[rate:"slow"]- Speech rate control -(text)[pitch:"high"]- Pitch control -(text)[volume:"loud"]- Volume control - Platform-specific: See speechmarkdown-js documentation for platform-specific features like Azure's express-as styles

`$3`

The full speechmarkdown-js converter now loads by default in both Node and browser environments. If you need to opt out (for very small lambda bundles or for deterministic tests), you can:

`bash

`Disable globally`


SPEECHMARKDOWN_DISABLE=true npm test
Or force-enable/disable explicitly

SPEECHMARKDOWN_ENABLE=false npm test
SPEECHMARKDOWN_ENABLE=true npm test


Or disable/enable programmatically:

`ts import { SpeechMarkdown } from "js-tts-wrapper";

SpeechMarkdown.configureSpeechMarkdown({ enabled: false }); // fallback-only SpeechMarkdown.configureSpeechMarkdown({ enabled: true }); // ensure full parser`

Alternatively, you can import the function directly:

`ts import { configureSpeechMarkdown } from "js-tts-wrapper";

configureSpeechMarkdown({ enabled: true }); // ensure full parser`

When disabled, js-tts-wrapper falls back to the lightweight built-in converter (suitable for basic [break] patterns). Re-enable it to regain advanced tags (Azure express-as, Polly styles, google:style, etc.).

`$3`

| Engine | Speech Markdown Support | Processing Method | |--------|------------------------|-------------------| | Google Cloud TTS | ✅ Full | → SSML → Native processing | | Microsoft Azure | ✅ Full | → SSML → Native processing | | Amazon Polly | ✅ Full | → SSML → Dynamic processing (engine-dependent) | | WitAI | ✅ Full | → SSML → Native processing | | SAPI | ✅ Full | → SSML → Native processing | | eSpeak | ✅ Full | → SSML → Native processing | | ElevenLabs | ✅ Converted | → SSML → Plain text | | OpenAI | ✅ Converted | → SSML → Plain text | | PlayHT | ✅ Converted | → SSML → Plain text | | SherpaOnnx | ✅ Converted | → SSML → Plain text |

`$3`

The library provides two complementary approaches for controlling speech synthesis:

| Approach | Use Case | Example | |----------|----------|---------| | Speech Markdown | Easy, readable syntax for common features |(Hello!)[excited:"1.5"]| | Raw SSML | Direct control, advanced features, provider-specific tags |Hello! |

Speech Markdown Flow:`Speech Markdown → speechmarkdown-js → Platform-specific SSML → Provider`

Raw SSML Flow:`Raw SSML → Minimal processing → Provider`

When to use Speech Markdown: - You want readable, maintainable code - You're using common features (breaks, emphasis, rate, pitch, volume) - You want platform-specific optimizations automatically applied - You want the same code to work across multiple TTS engines

When to use Raw SSML with rawSSML: true:- You need advanced provider-specific features (e.g., Azure's mstts:dialog for multi-speaker) - You're working with SSML generated by other tools - You need fine-grained control over SSML structure - You want to bypass validation for experimental features

Combining both approaches:`typescript // Use speechmarkdown-js directly for advanced features import { SpeechMarkdown } from 'speechmarkdown-js';

const markdown = "(This is exciting!)[excited:\"1.5\"] with (multi-speaker)[mstts:dialog]"; const ssml = await SpeechMarkdown.toSSML(markdown, "microsoft-azure");

// Pass the result with rawSSML to bypass wrapper validation await tts.speak(ssml, { rawSSML: true });`

`Engine-Specific Examples`

Each TTS engine has its own specific setup. Here are examples for each supported engine in both ESM and CommonJS formats:

`$3`

#### ESM`javascript import { AzureTTSClient } from 'js-tts-wrapper';

const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

await tts.speak('Hello from Azure!');`

#### CommonJS`javascript const { AzureTTSClient } = require('js-tts-wrapper');

const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// Inside an async function await tts.speak('Hello from Azure!');`

`$3`

Note: Google Cloud TTS supports both authentication methods — Service Account (Node SDK) and API key (REST, browser‑safe).

#### ESM`javascript import { GoogleTTSClient } from 'js-tts-wrapper';

const tts = new GoogleTTSClient({ keyFilename: '/path/to/service-account-key.json' });

await tts.speak('Hello from Google Cloud!');`

#### CommonJS`javascript const { GoogleTTSClient } = require('js-tts-wrapper');

const tts = new GoogleTTSClient({ keyFilename: '/path/to/service-account-key.json' });

// Inside an async function await tts.speak('Hello from Google Cloud!');`

#### API key mode (Node or Browser)

Google Cloud Text-to-Speech also supports an API key over the REST API. This is browser-safe and requires no service account file. Restrict the key in Google Cloud Console (enable only Text-to-Speech API and restrict by HTTP referrer for browser use).

ESM (Node or Browser):`javascript import { GoogleTTSClient } from 'js-tts-wrapper';

const tts = new GoogleTTSClient({ apiKey: process.env.GOOGLECLOUDTTS_API_KEY || 'your-api-key', // optional defaults voiceId: 'en-US-Wavenet-D', lang: 'en-US' });

await tts.speak('Hello from Google TTS with API key!');`

CommonJS (Node):`javascript const { GoogleTTSClient } = require('js-tts-wrapper');

const tts = new GoogleTTSClient({ apiKey: process.env.GOOGLECLOUDTTS_API_KEY || 'your-api-key' });

(async () => { await tts.speak('Hello from Google TTS with API key!'); })();`

Notes: - REST v1 does not return word timepoints; the wrapper provides estimated timings for boundary events. - For true timings, use service account credentials (Node) where the beta client can be used. - Environment variable supported by examples/tests:GOOGLECLOUDTTS_API_KEY.

`$3`

#### ESM`javascript import { PollyTTSClient } from 'js-tts-wrapper';

const tts = new PollyTTSClient({ region: 'us-east-1', accessKeyId: 'your-access-key-id', secretAccessKey: 'your-secret-access-key' });

await tts.speak('Hello from AWS Polly!');`

#### CommonJS`javascript const { PollyTTSClient } = require('js-tts-wrapper');

const tts = new PollyTTSClient({ region: 'us-east-1', accessKeyId: 'your-access-key-id', secretAccessKey: 'your-secret-access-key' });

// Inside an async function await tts.speak('Hello from AWS Polly!');`

`$3`

#### ESM`javascript import { ElevenLabsTTSClient } from 'js-tts-wrapper';

const tts = new ElevenLabsTTSClient({ apiKey: 'your-api-key' });

await tts.speak('Hello from ElevenLabs!');`

#### CommonJS`javascript const { ElevenLabsTTSClient } = require('js-tts-wrapper');

const tts = new ElevenLabsTTSClient({ apiKey: 'your-api-key' });

// Inside an async function await tts.speak('Hello from ElevenLabs!');`

`$3`

#### ESM`javascript import { OpenAITTSClient } from 'js-tts-wrapper';

const tts = new OpenAITTSClient({ apiKey: 'your-api-key' });

await tts.speak('Hello from OpenAI!');`

#### CommonJS`javascript const { OpenAITTSClient } = require('js-tts-wrapper');

const tts = new OpenAITTSClient({ apiKey: 'your-api-key' });

// Inside an async function await tts.speak('Hello from OpenAI!');`

`$3`

#### ESM`javascript import { PlayHTTTSClient } from 'js-tts-wrapper';

const tts = new PlayHTTTSClient({ apiKey: 'your-api-key', userId: 'your-user-id' });

await tts.speak('Hello from PlayHT!');`

#### CommonJS`javascript const { PlayHTTTSClient } = require('js-tts-wrapper');

const tts = new PlayHTTTSClient({ apiKey: 'your-api-key', userId: 'your-user-id' });

// Inside an async function await tts.speak('Hello from PlayHT!');`

`$3`

#### ESM`javascript import { WatsonTTSClient } from 'js-tts-wrapper';

const tts = new WatsonTTSClient({ apiKey: 'your-api-key', region: 'us-south', instanceId: 'your-instance-id' });

await tts.speak('Hello from IBM Watson!');`

#### CommonJS`javascript const { WatsonTTSClient } = require('js-tts-wrapper');

const tts = new WatsonTTSClient({ apiKey: 'your-api-key', region: 'us-south', instanceId: 'your-instance-id' });

// Inside an async function await tts.speak('Hello from IBM Watson!');`

`$3`

#### ESM`javascript import { WitAITTSClient } from 'js-tts-wrapper';

const tts = new WitAITTSClient({ token: 'your-wit-ai-token' });

await tts.speak('Hello from Wit.ai!');`

#### CommonJS`javascript const { WitAITTSClient } = require('js-tts-wrapper');

const tts = new WitAITTSClient({ token: 'your-wit-ai-token' });

// Inside an async function await tts.speak('Hello from Wit.ai!');`

`$3`

#### ESM`javascript import { SherpaOnnxTTSClient } from 'js-tts-wrapper';

const tts = new SherpaOnnxTTSClient(); // The client will automatically download models when needed

await tts.speak('Hello from SherpaOnnx!');`

#### CommonJS`javascript const { SherpaOnnxTTSClient } = require('js-tts-wrapper');

const tts = new SherpaOnnxTTSClient(); // The client will automatically download models when needed

// Inside an async function await tts.speak('Hello from SherpaOnnx!');`

> Note: SherpaOnnx is a server-side only engine and requires specific environment setup. See the SherpaOnnx documentation for details on setup and configuration. For browser environments, use SherpaOnnx-WASM instead.

`$3`

#### ESM`javascript import { EspeakNodeTTSClient } from 'js-tts-wrapper';

const tts = new EspeakNodeTTSClient();

await tts.speak('Hello from eSpeak NG!');`

#### CommonJS`javascript const { EspeakNodeTTSClient } = require('js-tts-wrapper');

const tts = new EspeakNodeTTSClient();

// Inside an async function await tts.speak('Hello from eSpeak NG!');`

> Note: This engine uses the text2wav package and is designed for Node.js environments only. For browser environments, use the eSpeak NG Browser engine instead.

`$3`

#### ESM`javascript import { EspeakBrowserTTSClient } from 'js-tts-wrapper';

const tts = new EspeakBrowserTTSClient();

await tts.speak('Hello from eSpeak NG Browser!');`

#### CommonJS`javascript const { EspeakBrowserTTSClient } = require('js-tts-wrapper');

const tts = new EspeakBrowserTTSClient();

// Inside an async function await tts.speak('Hello from eSpeak NG Browser!');`

> Note: This engine works in both Node.js (using the mespeak package) and browser environments (using meSpeak.js). For browser use, include meSpeak.js in your HTML before using this engine.

#### Backward Compatibility

For backward compatibility, the old class names are still available: -EspeakTTSClient (alias for EspeakNodeTTSClient) -EspeakWasmTTSClient (alias for EspeakBrowserTTSClient)

However, we recommend using the new, clearer names in new code.

`$3`

#### ESM`javascript import { SAPITTSClient } from 'js-tts-wrapper';

const tts = new SAPITTSClient();

await tts.speak('Hello from Windows SAPI!');`

#### CommonJS`javascript const { SAPITTSClient } = require('js-tts-wrapper');

const tts = new SAPITTSClient();

// Inside an async function await tts.speak('Hello from Windows SAPI!');`

> Note: This engine is Windows-only

`API Reference`

`$3`

| Function | Description | Return Type | |--------|-------------|-------------| |createTTSClient(engine, credentials) | Create a TTS client for the specified engine | AbstractTTSClient |

`$3`

| Method | Description | Return Type | |--------|-------------|-------------| |getVoices() | Get all available voices | Promise| |getVoicesByLanguage(language) | Get voices for a specific language | Promise| |setVoice(voiceId, lang?) | Set the voice to use | void| |synthToBytes(text, options?) | Convert text to audio bytes | Promise| |synthToBytestream(text, options?) | Stream synthesis with word boundaries | Promise<{audioStream, wordBoundaries}>| |speak(text, options?) | Synthesize and play audio | Promise| |speakStreamed(text, options?) | Stream synthesis and play | Promise| |synthToFile(text, filename, format?, options?) | Save synthesized speech to a file | Promise| |startPlaybackWithCallbacks(text, callback, options?) | Play with word boundary callbacks | Promise| |pause() | Pause audio playback | void| |resume() | Resume audio playback | void| |stop() | Stop audio playback | void| |on(event, callback) | Register event handler | void| |connect(event, callback) | Connect to event | void| |checkCredentials() | Check if credentials are valid | Promise| |checkCredentialsDetailed() | Check if credentials are valid with detailed response | Promise| |getProperty(propertyName) | Get a property value | PropertyType| |setProperty(propertyName, value) | Set a property value | void |

The checkCredentialsDetailed() method returns a CredentialsCheckResultobject with the following properties:`typescript { success: boolean; // Whether the credentials are valid error?: string; // Error message if credentials are invalid voiceCount?: number; // Number of voices available if credentials are valid }`

`$3`

The ssml property provides a builder for creating SSML:

| Method | Description | |--------|-------------| |prosody(attrs, text)| Add prosody element | |break(time)| Add break element | |emphasis(level, text)| Add emphasis element | |sayAs(interpretAs, text)| Add say-as element | |phoneme(alphabet, ph, text)| Add phoneme element | |sub(alias, text)| Add substitution element | |toString() | Convert to SSML string |

`Browser Support`

The library works in both Node.js and browser environments. In browsers, use the ESM or UMD bundle:

`html`

`$3`

- Auto-load WASM: pass either wasmBaseUrl (directory with sherpaonnx.js + .wasm) or wasmPath(full glue JS URL). The runtime loads the glue and points Module.locateFile to fetch the .wasm. - Models index: setmergedModelsUrlto your hosted merged_models.json (defaults to ./data/merged_models.json when available). - Capabilities: each client exposesclient.capabilities to help UIs filter engines.

`html`

#### Hosted WASM assets (optional)

For convenience, we publish prebuilt SherpaONNX TTS WebAssembly files to a separate assets repository. You can use these as a quick-start base URL, or self-host them for production.

- Default CDN base (via jsDelivr): - https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/tts/vocoder-models - Files included (loader-only build: no .data file): - sherpa-onnx-tts.js (glue; sometimes named sherpa-onnx.js depending on upstream tag) - sherpa-onnx-wasm-main-tts.wasm (or sherpa-onnx-wasm-main.wasm) - sherpa-onnx-wasm-main-tts.js (or sherpa-onnx-wasm-main.js)

- Models index (merged_models.json): - Canonical latest: https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/models/merged_models.json - Snapshot for this WASM tag: https://cdn.jsdelivr.net/gh/willwade/js-tts-wrapper-assets@main/sherpaonnx/tts//merged_models.json

- Example (using hosted artifacts):

`html`

Notes:

#### Hosting on Hugging Face (avoids jsDelivr 50 MB cap)

You can self-host the loader-only WASM on Hugging Face (recommended for large artifacts):

- Create a Dataset or Model repo, e.g. datasets/willwade/js-tts-wrapper-wasm - Upload these files into a folder like sherpaonnx/tts/vocoder-models: - sherpa-onnx-tts.js - sherpa-onnx-wasm-main-tts.wasm - (optionally) sherpa-onnx-wasm-main-tts.js - Optional: also upload merged_models.json to sherpaonnx/models/merged_models.json - Use the Hugging Face raw URLs with the “resolve” path: - wasmPath: https://huggingface.co/datasets/your-user/your-repo/resolve/main/sherpaonnx/tts/vocoder-models/sherpa-onnx-tts.js - mergedModelsUrl: https://huggingface.co/datasets/your-user/your-repo/resolve/main/sherpaonnx/models/merged_models.json

Example:

`html`

Notes: - Hugging Face supports large files via Git LFS and serves them over a global CDN with proper CORS. - The glue JS will fetch the .wasm next to it automatically; ensure correct MIME types are served (HF does this by default). - For best performance, keep models separate and load them at runtime via their original URLs (or mirror selected ones to HF if needed).

- For production, we recommend self-hosting to ensure stable availability and correct MIME types (application/wasm for .wasm, text/javascript for .js). If your server uses different filenames, just point wasmPathat your glue JS file; the runtime will find the .wasm next to it. - Our engine also acceptswasmBaseUrl if you host with filenames matching your environment; when using the upstream build outputs shown above, wasmPath is the safest choice.

`Examples and Demos`

- Vue.js Browser Demo (recommended for browsers) - Path: examples/vue-browser-demo/ - Run locally: - cd examples/vue-browser-demo - npm install - npm run dev - Notes: The Vite config aliases js-tts-wrapper/browser to the workspace source for local development. For production, you can import from the published package directly.

- Browser Unified Test Page (quick, no build) - Path: examples/browser-unified-test.html - Open this file directly in a modern browser. It exercises multiple engines and shows real-time word highlighting.

- Node.js CLI Demo - Path: examples/node-demo/ - Run: node demo.mjs [--engine ] [--text "Hello"] - Shows boundary callbacks and file/bytes synthesis from Node. Engines requiring credentials read them from environment variables.

SherpaONNX notes - Browser (WASM): The demos accept wasmPath/mergedModelsUrl. You can use the hosted assets shown in the Browser Support section (jsDelivr or Hugging Face resolve URLs). - Node (native): If SHERPAONNX_MODEL_PATH is not set, the wrapper now defaults to a Kokoro English model (kokoro-en-en-19) and will auto-download on first use.

`Contributing`

Contributions are welcome! Please feel free to submit a Pull Request.

`Optional Dependencies`

The library uses a peer dependencies approach to minimize the installation footprint. You can install only the dependencies you need for the engines you plan to use.

`bash

`Install the base package`


npm install js-tts-wrapper
Install dependencies for specific engines

npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk  # For Azure TTS
npm install @google-cloud/text-to-speech  # For Google TTS
npm install @aws-sdk/client-polly  # For AWS Polly
npm install openai  # For OpenAI TTS
npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream  # For SherpaOnnx TTS
npm install text2wav  # For eSpeak NG (Node.js)
npm install mespeak  # For eSpeak NG-WASM (Node.js)
Install dependencies for Node.js audio playback

npm install sound-play speaker pcm-convert  # For audio playback in Node.js


You can also use the npm scripts provided by the package to install specific engine dependencies:

`bash

`Navigate to your project directory where js-tts-wrapper is installed`


cd your-project
Install specific engine dependencies

npx js-tts-wrapper@latest run install:azure
npx js-tts-wrapper@latest run install:google
npx js-tts-wrapper@latest run install:polly
npx js-tts-wrapper@latest run install:openai
npx js-tts-wrapper@latest run install:sherpaonnx
npx js-tts-wrapper@latest run install:espeak
npx js-tts-wrapper@latest run install:espeak-wasm
npx js-tts-wrapper@latest run install:system
Install Node.js audio playback dependencies

npx js-tts-wrapper@latest run install:node-audio
Install all development dependencies

npx js-tts-wrapper@latest run install:all-dev


Node.js Audio Playback

The library supports audio playback in Node.js environments with the optional sound-play package. This allows you to use the speak() and speakStreamed() methods in Node.js applications, just like in browser environments.

To enable Node.js audio playback:

1. Install the required dependencies:`bash npm install sound-play pcm-convert`

Or use the npm script:`bash npx js-tts-wrapper@latest run install:node-audio`

2. Use the TTS client as usual:`typescript import { TTSFactory } from 'js-tts-wrapper';

const tts = TTSFactory.createTTSClient('mock');

// Play audio in Node.js await tts.speak('Hello, world!');`

If the sound-play package is not installed, the library will fall back to providing informative messages and suggest installing the package.

`Testing and Troubleshooting`

`$3`

The library includes a comprehensive unified test runner that supports multiple testing modes and engines:

`bash

`Basic usage - test all engines`


node examples/unified-test-runner.js
Test a specific engine

node examples/unified-test-runner.js [engine-name]
Test with different modes

node examples/unified-test-runner.js [engine-name] --mode=[MODE]

$3

| Mode | Description | Usage | |------|-------------|-------| |basic | Basic synthesis tests (default) | node examples/unified-test-runner.js azure| |audio | Audio-only tests with playback | PLAY_AUDIO=true node examples/unified-test-runner.js azure --mode=audio| |playback | Playback control tests (pause/resume/stop) | node examples/unified-test-runner.js azure --mode=playback| |features | Comprehensive feature tests | node examples/unified-test-runner.js azure --mode=features| |example | Full examples with SSML, streaming, word boundaries | node examples/unified-test-runner.js azure --mode=example| |debug | Debug mode for troubleshooting | node examples/unified-test-runner.js sherpaonnx --mode=debug| |stream | Streaming tests with real-time playback | PLAY_AUDIO=true node examples/unified-test-runner.js playht --mode=stream |

`$3`

To test audio playback with any TTS engine, use the PLAY_AUDIO environment variable:

`bash

`Test a specific engine with audio playback`


PLAY_AUDIO=true node examples/unified-test-runner.js [engine-name] --mode=audio
Examples:

PLAY_AUDIO=true node examples/unified-test-runner.js witai --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js azure --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js polly --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js system --mode=audio


$3
SherpaOnnx requires special environment setup. Use the helper script:

`bash

`Test SherpaOnnx with audio playback`


PLAY_AUDIO=true node scripts/run-with-sherpaonnx.cjs examples/unified-test-runner.js sherpaonnx --mode=audio
Debug SherpaOnnx issues

node scripts/run-with-sherpaonnx.cjs examples/unified-test-runner.js sherpaonnx --mode=debug
Use npm scripts (recommended)

npm run example:sherpaonnx:mac
PLAY_AUDIO=true npm run example:sherpaonnx:mac


$3
The package provides convenient npm scripts for testing specific engines:

`bash

`Test specific engines using npm scripts`


npm run example:azure
npm run example:google
npm run example:polly
npm run example:openai
npm run example:elevenlabs
npm run example:playht
npm run example:system
npm run example:sherpaonnx:mac  # For SherpaOnnx with environment setup
With audio playback

PLAY_AUDIO=true npm run example:azure
PLAY_AUDIO=true npm run example:system
PLAY_AUDIO=true npm run example:sherpaonnx:mac


$3
For detailed help and available options:

`bash

`Show help and available engines`


node examples/unified-test-runner.js --help
Show available test modes

node examples/unified-test-runner.js --mode=help


$3
The library includes automatic format conversion for engines that don't natively support the requested format:

`javascript // Request MP3, get MP3 if supported, WAV with warning if not const audioBytes = await client.synthToBytes("Hello world", { format: "mp3" }); await client.synthToFile("Hello world", "output", "mp3"); await client.speak("Hello world", { format: "mp3" });`

Supported Formats: WAV, MP3, OGG

Engine Behavior: - Native Support: Azure, Polly, PlayHT support multiple formats natively - Automatic Conversion: SAPI, SherpaOnnx convert from WAV when possible - Graceful Fallback: Returns native format with helpful warnings when conversion isn't available

Environment Support: - Node.js: Full format conversion support (installffmpegfor advanced conversions) - Browser: Engines return their native format (no conversion)

`$3`

1. No Audio in Node.js: Install audio dependencies with npm install sound-play speaker pcm-convert2. SherpaOnnx Not Working: Use the helper script and ensure environment variables are set correctly 3. WitAI Audio Issues: The library automatically handles WitAI's raw PCM format conversion 4. Sample Rate Issues: Different engines use different sample rates (WitAI: 24kHz, Polly: 16kHz) - this is handled automatically 5. Format Conversion: Installffmpeg` for advanced audio format conversion in Node.js

For detailed troubleshooting, see the docs/ directory, especially:
- SherpaOnnx Documentation
- SherpaOnnx Troubleshooting

License

This project is licensed under the MIT License - see the LICENSE file for details.

js-tts-wrapper

Features

Supported TTS Engines

$3

#### Word Boundary and Timing Support

Character-Level Timing: Only ElevenLabs provides precise character-level timing data via the /with-timestamps endpoint, enabling the most accurate word highlighting and speech synchronization.

#### Audio Format Conversion Support

Installation

The library uses a modular approach where TTS engine-specific dependencies are optional. You can install the package and its dependencies as follows:

$3

``bash

`Install the base package`


npm install js-tts-wrapper
Install dependencies for specific engines

npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk  # For Azure
npm install @google-cloud/text-to-speech  # For Google Cloud
npm install @aws-sdk/client-polly  # For AWS Polly
npm install node-fetch@2  # For ElevenLabs and PlayHT
npm install openai  # For OpenAI
npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream  # For SherpaOnnx
npm install text2wav  # For eSpeak NG (Node.js)
npm install mespeak  # For eSpeak NG-WASM (Node.js)
npm install say  # For System TTS (Node.js)
npm install sound-play pcm-convert  # For Node.js audio playback


$3
After installing the base package, you can use the npm scripts provided by the package to install specific engine dependencies:

`bash

`Navigate to your project directory where js-tts-wrapper is installed`


cd your-project
Install Azure dependencies

npx js-tts-wrapper@latest run install:azure
Install SherpaOnnx dependencies

npx js-tts-wrapper@latest run install:sherpaonnx
Install eSpeak NG dependencies (Node.js)

npx js-tts-wrapper@latest run install:espeak
Install eSpeak NG-WASM dependencies (Node.js)

npx js-tts-wrapper@latest run install:espeak-wasm
Install System TTS dependencies (Node.js)

npx js-tts-wrapper@latest run install:system
Install Node.js audio playback dependencies

npx js-tts-wrapper@latest run install:node-audio
Install all development dependencies

npx js-tts-wrapper@latest run install:all-dev


Quick Start
$3
#### ESM (ECMAScript Modules)

`javascript import { AzureTTSClient } from 'js-tts-wrapper';

// Initialize the client with your credentials const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// List available voices const voices = await tts.getVoices(); console.log(voices);

// Set a voice tts.setVoice('en-US-AriaNeural');

// Speak some text await tts.speak('Hello, world!');

// Use SSML for more control const ssml = 'Hello world!'; await tts.speak(ssml);`

#### CommonJS

`javascript const { AzureTTSClient } = require('js-tts-wrapper');

// Initialize the client with your credentials const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// Use async/await within an async function async function runExample() { // List available voices const voices = await tts.getVoices(); console.log(voices);

// Set a voice tts.setVoice('en-US-AriaNeural');

// Speak some text await tts.speak('Hello, world!');

// Use SSML for more control const ssml = 'Hello world!'; await tts.speak(ssml); }

runExample().catch(console.error);`

`$3`

The library provides a factory function to create TTS clients dynamically based on the engine name:

#### ESM (ECMAScript Modules)

`javascript import { createTTSClient } from 'js-tts-wrapper';

// Create a TTS client using the factory function const tts = createTTSClient('azure', { subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// Use the client as normal await tts.speak('Hello from the factory pattern!');`

#### CommonJS

`javascript const { createTTSClient } = require('js-tts-wrapper');

// Create a TTS client using the factory function const tts = createTTSClient('azure', { subscriptionKey: 'your-subscription-key', region: 'westeurope' });

async function runExample() { // Use the client as normal await tts.speak('Hello from the factory pattern!'); }

runExample().catch(console.error);`

`Core Functionality`

All TTS engines in js-tts-wrapper implement a common set of methods and features through the AbstractTTSClient class. This ensures consistent behavior across different providers.

`$3`

`typescript // Get all available voices const voices = await tts.getVoices();

// Get voices for a specific language const englishVoices = await tts.getVoicesByLanguage('en-US');

// Set the voice to use tts.setVoice('en-US-AriaNeural');`

The library includes a robust Language Normalization system that standardizes language codes across different TTS engines. This allows you to:

`$3`

All TTS engines support standardized credential validation to help you verify your setup before making requests:

`typescript // Basic validation - returns boolean const isValid = await tts.checkCredentials(); if (!isValid) { console.error('Invalid credentials!'); }

See the Credential Validation Guide for detailed requirements and troubleshooting.

`$3`

`typescript // Convert text to audio bytes (Uint8Array) const audioBytes = await tts.synthToBytes('Hello, world!');

// Stream synthesis with word boundary information const { audioStream, wordBoundaries } = await tts.synthToBytestream('Hello, world!');`

`$3`

`typescript // Traditional text synthesis and playback await tts.speak('Hello, world!');

// NEW: Play audio from different sources without re-synthesizing // Play from file await tts.speak({ filename: 'path/to/audio.mp3' });

// Play from audio bytes const audioBytes = await tts.synthToBytes('Hello, world!'); await tts.speak({ audioBytes: audioBytes });

// Play from audio stream const { audioStream } = await tts.synthToBytestream('Hello, world!'); await tts.speak({ audioStream: audioStream });

// All input types work with speakStreamed too await tts.speakStreamed({ filename: 'path/to/audio.mp3' });

// Playback control tts.pause(); // Pause playback tts.resume(); // Resume playback tts.stop(); // Stop playback

#### Benefits of Multi-Source Audio Playback

`$3`

`typescript // Save synthesized speech to a file await tts.synthToFile('Hello, world!', 'output', 'mp3');`

`$3`

// Alternative event connection tts.connect('onStart', () => console.log('Speech started')); tts.connect('onEnd', () => console.log('Speech ended'));`

`$3`

Word boundary events provide precise timing information for speech synchronization, word highlighting, and interactive applications.

#### Basic Word Boundary Usage

`typescript // Enable word boundary events tts.on('boundary', (word, startTime, endTime) => { console.log("${word}" spoken from ${startTime}s to ${endTime}s); });

await tts.speak('Hello world, this is a test.'); // Output: // "Hello" spoken from 0.000s to 0.300s // "world," spoken from 0.300s to 0.600s // "this" spoken from 0.600s to 0.900s // ...`

#### Advanced Timing with Character-Level Precision (ElevenLabs)

`typescript // ElevenLabs: Enable character-level timing for maximum accuracy const tts = createTTSClient('elevenlabs');

// Method 1: Using synthToBytestream with timestamps const result = await tts.synthToBytestream('Hello world', { useTimestamps: true });

// Method 2: Using enhanced callback support await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => { console.log(Precise timing: "${word}" from ${start}s to ${end}s); });`

#### Real-Time Word Highlighting Example

await tts.speak('Hello world, this is a test.', { useWordBoundary: true });`

`SSML Support`

The library provides comprehensive SSML (Speech Synthesis Markup Language) support with engine-specific capabilities:

`$3`

`typescript // Use SSML directly (works with supported engines) const ssml =

This text will be spoken slowly with a low pitch.

This text is emphasized.

; await tts.speak(ssml);

await tts.speak(ssmlText);`

`$3`

`typescript // Example: Azure multi-speaker dialog, currently easier to author as raw SSML const azureDialogSSML =

Welcome to the demo.

Hey there! This turn uses a different voice.

;

await tts.speak(azureDialogSSML, { rawSSML: true });

// When rawSSML=true the wrapper will: // 1. Skip Speech Markdown conversion // 2. Skip SSML validation / normalization // 3. Pass the SSML through unchanged (aside from ensuring exists)`

Need to mix-and-match? Convert Speech Markdown to SSML yourself (using speechmarkdown-js directly) and then send it through rawSSML: true to avoid duplicate parsing:

`typescript import { SpeechMarkdown } from "speechmarkdown-js";

const markdown = "(Hello!)[excited:\"1.5\"] with (characters)[character:superhero]"; const ssml = await SpeechMarkdown.toSSML(markdown, { platform: "microsoft-azure" });

await tts.speak(ssml, { rawSSML: true });`

If you hit a Speech Markdown feature gap, consider contributing upstream—the library powers our conversion pipeline, so improvements there benefit every js-tts-wrapper user.

`Speech Markdown Support`

`$3`

The speechmarkdown-js library ships dedicated formatters for every major provider:

`$3`

// If you omit useSpeechMarkdown, the wrapper auto-enables it when Speech Markdown syntax is detected.

// Platform-specific Speech Markdown features // Azure: Section modifiers map to mstts:express-as const azureMarkdown =

#[excited]
This entire section is excited!
Multiple sentences work too.

#[defaults]
Back to the neutral voice.
; await azureTts.speak(azureMarkdown, { useSpeechMarkdown: true });

// Speech Markdown works with all engines const ttsGoogle = new TTSClient('google'); const ttsElevenLabs = new TTSClient('elevenlabs');

`$3`

The full speechmarkdown-js converter now loads by default in both Node and browser environments. If you need to opt out (for very small lambda bundles or for deterministic tests), you can:

`bash

`Disable globally`


SPEECHMARKDOWN_DISABLE=true npm test
Or force-enable/disable explicitly

SPEECHMARKDOWN_ENABLE=false npm test
SPEECHMARKDOWN_ENABLE=true npm test


Or disable/enable programmatically:

`ts import { SpeechMarkdown } from "js-tts-wrapper";

SpeechMarkdown.configureSpeechMarkdown({ enabled: false }); // fallback-only SpeechMarkdown.configureSpeechMarkdown({ enabled: true }); // ensure full parser`

Alternatively, you can import the function directly:

`ts import { configureSpeechMarkdown } from "js-tts-wrapper";

configureSpeechMarkdown({ enabled: true }); // ensure full parser`

`$3`

The library provides two complementary approaches for controlling speech synthesis:

Speech Markdown Flow:`Speech Markdown → speechmarkdown-js → Platform-specific SSML → Provider`

Raw SSML Flow:`Raw SSML → Minimal processing → Provider`

Combining both approaches:`typescript // Use speechmarkdown-js directly for advanced features import { SpeechMarkdown } from 'speechmarkdown-js';

const markdown = "(This is exciting!)[excited:\"1.5\"] with (multi-speaker)[mstts:dialog]"; const ssml = await SpeechMarkdown.toSSML(markdown, "microsoft-azure");

// Pass the result with rawSSML to bypass wrapper validation await tts.speak(ssml, { rawSSML: true });`

`Engine-Specific Examples`

Each TTS engine has its own specific setup. Here are examples for each supported engine in both ESM and CommonJS formats:

`$3`

#### ESM`javascript import { AzureTTSClient } from 'js-tts-wrapper';

const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

await tts.speak('Hello from Azure!');`

#### CommonJS`javascript const { AzureTTSClient } = require('js-tts-wrapper');

const tts = new AzureTTSClient({ subscriptionKey: 'your-subscription-key', region: 'westeurope' });

// Inside an async function await tts.speak('Hello from Azure!');`

`$3`

Note: Google Cloud TTS supports both authentication methods — Service Account (Node SDK) and API key (REST, browser‑safe).

#### ESM`javascript import { GoogleTTSClient } from 'js-tts-wrapper';

const tts = new GoogleTTSClient({ keyFilename: '/path/to/service-account-key.json' });

await tts.speak('Hello from Google Cloud!');`

#### CommonJS`javascript const { GoogleTTSClient } = require('js-tts-wrapper');

const tts = new GoogleTTSClient({ keyFilename: '/path/to/service-account-key.json' });

// Inside an async function await tts.speak('Hello from Google Cloud!');`

#### API key mode (Node or Browser)

ESM (Node or Browser):`javascript import { GoogleTTSClient } from 'js-tts-wrapper';

const tts = new GoogleTTSClient({ apiKey: process.env.GOOGLECLOUDTTS_API_KEY || 'your-api-key', // optional defaults voiceId: 'en-US-Wavenet-D', lang: 'en-US' });

await tts.speak('Hello from Google TTS with API key!');`

CommonJS (Node):`javascript const { GoogleTTSClient } = require('js-tts-wrapper');

const tts = new GoogleTTSClient({ apiKey: process.env.GOOGLECLOUDTTS_API_KEY || 'your-api-key' });

(async () => { await tts.speak('Hello from Google TTS with API key!'); })();`

`$3`

#### ESM`javascript import { PollyTTSClient } from 'js-tts-wrapper';

const tts = new PollyTTSClient({ region: 'us-east-1', accessKeyId: 'your-access-key-id', secretAccessKey: 'your-secret-access-key' });

await tts.speak('Hello from AWS Polly!');`

#### CommonJS`javascript const { PollyTTSClient } = require('js-tts-wrapper');

const tts = new PollyTTSClient({ region: 'us-east-1', accessKeyId: 'your-access-key-id', secretAccessKey: 'your-secret-access-key' });

// Inside an async function await tts.speak('Hello from AWS Polly!');`

`$3`

#### ESM`javascript import { ElevenLabsTTSClient } from 'js-tts-wrapper';

const tts = new ElevenLabsTTSClient({ apiKey: 'your-api-key' });

await tts.speak('Hello from ElevenLabs!');`

#### CommonJS`javascript const { ElevenLabsTTSClient } = require('js-tts-wrapper');

const tts = new ElevenLabsTTSClient({ apiKey: 'your-api-key' });

// Inside an async function await tts.speak('Hello from ElevenLabs!');`

`$3`

#### ESM`javascript import { OpenAITTSClient } from 'js-tts-wrapper';

const tts = new OpenAITTSClient({ apiKey: 'your-api-key' });

await tts.speak('Hello from OpenAI!');`

#### CommonJS`javascript const { OpenAITTSClient } = require('js-tts-wrapper');

const tts = new OpenAITTSClient({ apiKey: 'your-api-key' });

// Inside an async function await tts.speak('Hello from OpenAI!');`

`$3`

#### ESM`javascript import { PlayHTTTSClient } from 'js-tts-wrapper';

const tts = new PlayHTTTSClient({ apiKey: 'your-api-key', userId: 'your-user-id' });

await tts.speak('Hello from PlayHT!');`

#### CommonJS`javascript const { PlayHTTTSClient } = require('js-tts-wrapper');

const tts = new PlayHTTTSClient({ apiKey: 'your-api-key', userId: 'your-user-id' });

// Inside an async function await tts.speak('Hello from PlayHT!');`

`$3`

#### ESM`javascript import { WatsonTTSClient } from 'js-tts-wrapper';

const tts = new WatsonTTSClient({ apiKey: 'your-api-key', region: 'us-south', instanceId: 'your-instance-id' });

await tts.speak('Hello from IBM Watson!');`

#### CommonJS`javascript const { WatsonTTSClient } = require('js-tts-wrapper');

const tts = new WatsonTTSClient({ apiKey: 'your-api-key', region: 'us-south', instanceId: 'your-instance-id' });

// Inside an async function await tts.speak('Hello from IBM Watson!');`

`$3`

#### ESM`javascript import { WitAITTSClient } from 'js-tts-wrapper';

const tts = new WitAITTSClient({ token: 'your-wit-ai-token' });

await tts.speak('Hello from Wit.ai!');`

#### CommonJS`javascript const { WitAITTSClient } = require('js-tts-wrapper');

const tts = new WitAITTSClient({ token: 'your-wit-ai-token' });

// Inside an async function await tts.speak('Hello from Wit.ai!');`

`$3`

#### ESM`javascript import { SherpaOnnxTTSClient } from 'js-tts-wrapper';

const tts = new SherpaOnnxTTSClient(); // The client will automatically download models when needed

await tts.speak('Hello from SherpaOnnx!');`

#### CommonJS`javascript const { SherpaOnnxTTSClient } = require('js-tts-wrapper');

const tts = new SherpaOnnxTTSClient(); // The client will automatically download models when needed

// Inside an async function await tts.speak('Hello from SherpaOnnx!');`

`$3`

#### ESM`javascript import { EspeakNodeTTSClient } from 'js-tts-wrapper';

const tts = new EspeakNodeTTSClient();

await tts.speak('Hello from eSpeak NG!');`

#### CommonJS`javascript const { EspeakNodeTTSClient } = require('js-tts-wrapper');

const tts = new EspeakNodeTTSClient();

// Inside an async function await tts.speak('Hello from eSpeak NG!');`

> Note: This engine uses the text2wav package and is designed for Node.js environments only. For browser environments, use the eSpeak NG Browser engine instead.

`$3`

#### ESM`javascript import { EspeakBrowserTTSClient } from 'js-tts-wrapper';

const tts = new EspeakBrowserTTSClient();

await tts.speak('Hello from eSpeak NG Browser!');`

#### CommonJS`javascript const { EspeakBrowserTTSClient } = require('js-tts-wrapper');

const tts = new EspeakBrowserTTSClient();

// Inside an async function await tts.speak('Hello from eSpeak NG Browser!');`

> Note: This engine works in both Node.js (using the mespeak package) and browser environments (using meSpeak.js). For browser use, include meSpeak.js in your HTML before using this engine.

#### Backward Compatibility

For backward compatibility, the old class names are still available: -EspeakTTSClient (alias for EspeakNodeTTSClient) -EspeakWasmTTSClient (alias for EspeakBrowserTTSClient)

However, we recommend using the new, clearer names in new code.

`$3`

#### ESM`javascript import { SAPITTSClient } from 'js-tts-wrapper';

const tts = new SAPITTSClient();

await tts.speak('Hello from Windows SAPI!');`

#### CommonJS`javascript const { SAPITTSClient } = require('js-tts-wrapper');

const tts = new SAPITTSClient();

// Inside an async function await tts.speak('Hello from Windows SAPI!');`

> Note: This engine is Windows-only

`API Reference`

`$3`

| Function | Description | Return Type | |--------|-------------|-------------| |createTTSClient(engine, credentials) | Create a TTS client for the specified engine | AbstractTTSClient |

`$3`

The ssml property provides a builder for creating SSML:

`Browser Support`

The library works in both Node.js and browser environments. In browsers, use the ESM or UMD bundle:

`html`

`$3`

`html`

#### Hosted WASM assets (optional)

For convenience, we publish prebuilt SherpaONNX TTS WebAssembly files to a separate assets repository. You can use these as a quick-start base URL, or self-host them for production.

- Example (using hosted artifacts):

`html`

Notes:

#### Hosting on Hugging Face (avoids jsDelivr 50 MB cap)

You can self-host the loader-only WASM on Hugging Face (recommended for large artifacts):

Example:

`html`

`Examples and Demos`

`Contributing`

Contributions are welcome! Please feel free to submit a Pull Request.

`Optional Dependencies`

The library uses a peer dependencies approach to minimize the installation footprint. You can install only the dependencies you need for the engines you plan to use.

`bash

`Install the base package`


npm install js-tts-wrapper
Install dependencies for specific engines

npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk  # For Azure TTS
npm install @google-cloud/text-to-speech  # For Google TTS
npm install @aws-sdk/client-polly  # For AWS Polly
npm install openai  # For OpenAI TTS
npm install sherpa-onnx-node decompress decompress-bzip2 decompress-tarbz2 decompress-targz tar-stream  # For SherpaOnnx TTS
npm install text2wav  # For eSpeak NG (Node.js)
npm install mespeak  # For eSpeak NG-WASM (Node.js)
Install dependencies for Node.js audio playback

npm install sound-play speaker pcm-convert  # For audio playback in Node.js


You can also use the npm scripts provided by the package to install specific engine dependencies:

`bash

`Navigate to your project directory where js-tts-wrapper is installed`


cd your-project
Install specific engine dependencies

npx js-tts-wrapper@latest run install:azure
npx js-tts-wrapper@latest run install:google
npx js-tts-wrapper@latest run install:polly
npx js-tts-wrapper@latest run install:openai
npx js-tts-wrapper@latest run install:sherpaonnx
npx js-tts-wrapper@latest run install:espeak
npx js-tts-wrapper@latest run install:espeak-wasm
npx js-tts-wrapper@latest run install:system
Install Node.js audio playback dependencies

npx js-tts-wrapper@latest run install:node-audio
Install all development dependencies

npx js-tts-wrapper@latest run install:all-dev


Node.js Audio Playback

To enable Node.js audio playback:

1. Install the required dependencies:`bash npm install sound-play pcm-convert`

Or use the npm script:`bash npx js-tts-wrapper@latest run install:node-audio`

2. Use the TTS client as usual:`typescript import { TTSFactory } from 'js-tts-wrapper';

const tts = TTSFactory.createTTSClient('mock');

// Play audio in Node.js await tts.speak('Hello, world!');`

If the sound-play package is not installed, the library will fall back to providing informative messages and suggest installing the package.

`Testing and Troubleshooting`

`$3`

The library includes a comprehensive unified test runner that supports multiple testing modes and engines:

`bash

`Basic usage - test all engines`


node examples/unified-test-runner.js
Test a specific engine

node examples/unified-test-runner.js [engine-name]
Test with different modes

node examples/unified-test-runner.js [engine-name] --mode=[MODE]

$3

`$3`

To test audio playback with any TTS engine, use the PLAY_AUDIO environment variable:

`bash

`Test a specific engine with audio playback`


PLAY_AUDIO=true node examples/unified-test-runner.js [engine-name] --mode=audio
Examples:

PLAY_AUDIO=true node examples/unified-test-runner.js witai --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js azure --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js polly --mode=audio
PLAY_AUDIO=true node examples/unified-test-runner.js system --mode=audio


$3
SherpaOnnx requires special environment setup. Use the helper script:

`bash

`Test SherpaOnnx with audio playback`


PLAY_AUDIO=true node scripts/run-with-sherpaonnx.cjs examples/unified-test-runner.js sherpaonnx --mode=audio
Debug SherpaOnnx issues

node scripts/run-with-sherpaonnx.cjs examples/unified-test-runner.js sherpaonnx --mode=debug
Use npm scripts (recommended)

npm run example:sherpaonnx:mac
PLAY_AUDIO=true npm run example:sherpaonnx:mac


$3
The package provides convenient npm scripts for testing specific engines:

`bash

`Test specific engines using npm scripts`


npm run example:azure
npm run example:google
npm run example:polly
npm run example:openai
npm run example:elevenlabs
npm run example:playht
npm run example:system
npm run example:sherpaonnx:mac  # For SherpaOnnx with environment setup
With audio playback

PLAY_AUDIO=true npm run example:azure
PLAY_AUDIO=true npm run example:system
PLAY_AUDIO=true npm run example:sherpaonnx:mac


$3
For detailed help and available options:

`bash

`Show help and available engines`


node examples/unified-test-runner.js --help
Show available test modes

node examples/unified-test-runner.js --mode=help


$3
The library includes automatic format conversion for engines that don't natively support the requested format:

Supported Formats: WAV, MP3, OGG

Environment Support: - Node.js: Full format conversion support (installffmpegfor advanced conversions) - Browser: Engines return their native format (no conversion)

`$3`

For detailed troubleshooting, see the docs/ directory, especially:
- SherpaOnnx Documentation
- SherpaOnnx Troubleshooting

License

This project is licensed under the MIT License - see the LICENSE file for details.

js-tts-wrapper

js-tts-wrapper

Table of Contents

Features

Supported TTS Engines

$3

$3

Installation

$3

Install the base package

Install dependencies for specific engines

$3

Navigate to your project directory where js-tts-wrapper is installed

Install Azure dependencies

Install SherpaOnnx dependencies

Install eSpeak NG dependencies (Node.js)

Install eSpeak NG-WASM dependencies (Node.js)

Install System TTS dependencies (Node.js)

Install Node.js audio playback dependencies

Install all development dependencies

Quick Start

$3

$3

Core Functionality

$3

$3

$3

$3

$3

$3

$3

SSML Support

$3

$3

$3

$3

$3

Speech Markdown Support

$3

$3

$3

$3

$3

Disable globally

Or force-enable/disable explicitly

$3

$3

Engine-Specific Examples

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

API Reference

$3

$3

$3

Browser Support

$3

Examples and Demos

Contributing

Optional Dependencies

Install the base package

Install dependencies for specific engines

Install dependencies for Node.js audio playback

Navigate to your project directory where js-tts-wrapper is installed

Install specific engine dependencies

Install Node.js audio playback dependencies

Install all development dependencies

Node.js Audio Playback

Testing and Troubleshooting

$3

Basic usage - test all engines

`Install the base package`

`Navigate to your project directory where js-tts-wrapper is installed`

`$3`

`Core Functionality`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`SSML Support`

`$3`

`$3`

`$3`

`$3`

`$3`

`Speech Markdown Support`

`$3`

`$3`

`$3`

`$3`

`$3`

`Disable globally`

`$3`

`$3`

`Engine-Specific Examples`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`API Reference`

`$3`

`$3`

`$3`

`Browser Support`

`$3`

`Examples and Demos`

`Contributing`

`Optional Dependencies`

`Install the base package`

`Navigate to your project directory where js-tts-wrapper is installed`

`Testing and Troubleshooting`

`$3`

`Basic usage - test all engines`

`$3`

`Test a specific engine with audio playback`

`Test SherpaOnnx with audio playback`

`Test specific engines using npm scripts`

`Show help and available engines`

`$3`

`Install the base package`

`Navigate to your project directory where js-tts-wrapper is installed`

`$3`

`Core Functionality`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`SSML Support`

`$3`

`$3`

`$3`

`$3`

`$3`

`Speech Markdown Support`

`$3`

`$3`

`$3`