Echogarden

Echogarden is an easy-to-use speech toolset that includes a variety of speech processing tools.

* Easy to install, run, and update
* Written in TypeScript, for the Node.js runtime
* Can be used either as a command-line utility, or imported as a standard npm package
* Runs on Windows (x64, ARM64), macOS (x64, ARM64) and Linux (x64, ARM64)
* Doesn't require Python, Docker, or other system-level dependencies
* Doesn't rely on essential platform-specific binaries. Engines are either written in pure TypeScript, ported via WebAssembly, or imported using the ONNX runtime
* Fully open-source (GPL v3)

Features

* Text-to-speech using high-quality Kokoro and VITS offline models for many languages and dialects, and 16 other offline and online engines, including cloud services by Google, Microsoft, Amazon, OpenAI and ElevenLabs
* Speech-to-text using a custom TypeScript/ONNX port of the OpenAI Whisper speech recognition architecture, whisper.cpp, and several other engines, including cloud services by Google, Microsoft, Amazon and OpenAI
* Speech-to-transcript alignment using several variants of dynamic time warping (DTW, DTW-RA), including support for multi-pass (hierarchical) processing, or via guided decoding using Whisper recognition models. Supports 100+ languages
* Speech-to-text translation, translates speech in any of the 98 languages supported by Whisper, to English, with near word-level timing for the translated transcript
* Speech-to-translated-transcript alignment synchronizes spoken audio in one language, to a provided English-translated transcript, using the Whisper engine
* Speech-to-transcript-and-translation alignment synchronizes spoken audio in one language, to a translation in a variety of other languages, given both a transcript and its translation
* Text-to-text translation, translates text between various languages. Supports cloud-based Google Translate engine
* Language detection identifies the language of a given audio or text. Includes Whisper or Silero engines for spoken audio, and TinyLD or FastText for text
* Voice activity detection attempts to identify segments of audio where voice is active or inactive. Includes WebRTC VAD, Silero VAD, RNNoise-based VAD and a built-in Adaptive Gate algorithm
* Speech denoising attenuates background noise from spoken audio. Includes the RNNoise and NSNet2 engines
* Source separation isolates voice from any music or background ambience. Includes the MDX-NET deep learning architecture
* Word-level timestamps for all recognition, synthesis, alignment and translation outputs
* Advanced subtitle generation, accounting for sentence and phrase boundaries
* For the Kokoro, VITS and eSpeak-NG synthesis engines, includes enhancements to improve TTS pronunciation accuracy: adds text normalization (e.g. idiomatic date and currency pronunciation), English heteronym disambiguation (based on a simple rule-based model), various pronunciation corrections, and accepts user-provided pronunciation lexicons
* Internal package system that auto-downloads and installs voices, models and other resources, as needed

Installation

Ensure you have Node.js v18 or later installed (v22 or later is recommended).

then:
``bash npm install -g echogarden@latest`

`Update`

Simple, but may not always update to the very latest major version:`npm update -g echogarden`

You can also use npm-check-updatesto check for a newer version:`bash npm install -g npm-check-updates ncu -g echogarden`Then, if an updated version is available, use the command linencu provides to make the update.

`Using the command-line interface`

A small sample of command lines:`bash echogarden speak "Hello World!" echogarden speak-file story.txt --engine=kokoro echogarden transcribe speech.mp3 echogarden translate-speech speech.webm subtitles.srt echogarden align speech.opus transcript.txt echogarden isolate speech.wav`

See the command-line interface guide for more details on the operations supported, and the configuration options reference for a comprehensive list of all options supported.

`Using the Node.js API`

If you are a developer, you can also directly import the package as a dependency in your code. The API operations and options closely mirror the CLI.

`Documentation and guides`

* Quick guide to the command-line interface * Options reference * Full list of all available engines * Node.js API reference * Enabling the CUDA ONNX execution provider * Technical overview and Q&A * How to help * Setting up a development environment * Developer's task list * Release notes (for releases up to1.0.0`)

Credits

This project consolidates, and builds upon the effort of many different individuals and companies, as well as contributing a number of original works.

Developed by Rotem Dan (IPA: /ˈʁɒːtem ˈdän/).

License

GNU General Public License v3

Licenses for components, models and other dependencies are detailed on this page.

Echogarden

Echogarden is an easy-to-use speech toolset that includes a variety of speech processing tools.

Features

Update

Simple, but may not always update to the very latest major version:


npm update -g echogarden

You can also use npm-check-updates

 to check for a newer version:

bash
npm install -g npm-check-updates
ncu -g echogarden


Then, if an updated version is available, use the command line

ncu provides to make the update.

Using the command-line interface

A small sample of command lines:

bash
echogarden speak "Hello World!"
echogarden speak-file story.txt --engine=kokoro
echogarden transcribe speech.mp3
echogarden translate-speech speech.webm subtitles.srt
echogarden align speech.opus transcript.txt
echogarden isolate speech.wav

See the command-line interface guide for more details on the operations supported, and the configuration options reference for a comprehensive list of all options supported.