Echogarden is an integrated speech toolset that provides a variety of synthesis, recognition, alignment, and other processing tools, designed to be directly accessible to end-users:
- Easy to install, run, and update
- Runs on Windows (x64), macOS (x64, ARM64) and Linux (x64)
- Written in TypeScript, for the Node.js runtime
- Doesn't require Python, Docker, or similar system-level dependencies
- Fast, high-quality offline multilingual text-to-speech voices based on the VITS neural architecture
- Accurate offline speech recognition using OpenAI Whisper models
- Supports a large variety of offline and cloud engines, including services by Google, Microsoft, Amazon, Elevenlabs and others
- Word-level timestamps for all synthesis and recognition outputs
- Speech-to-transcript alignment using dynamic time warping (DTW), and dynamic time warping with recognition assist (DTW-RA) methods, including support for multi-pass (hierarchical) processing. Supports 100+ languages.
- Advanced subtitle generation, accounting for sentence and phrase boundaries
- Can directly generate translated transcripts for 98 languages, transcribed directly to English, and produce near word-level synchronized subtitles for the translated transcript
- Attempts to improve TTS pronunciation accuracy on some engines: adds text normalization (e.g. idiomatic date and currency pronunciation), heteronym disambiguation (based on a rule-based model) and user-customizable pronunciation lexicons
- Internal package system that auto-downloads and installs voices, models and other resources, as needed
- Other features include: language detection (for both audio and text), voice activity detection, and speech denoising
Ensure you have Node.js
v16.0.0 or later installed.
npm install echogarden -g
Additional required tools:
Both tools are auto-downloaded as internal packages on Windows and Linux.
On macOS, they are not currently auto-downloaded due to security issues with unsigned binaries. It is recommended to install them via a package manager like Homebrew (
brew install ffmpeg sox) and ensure they are available on the system path.
Updating to latest version
npm update echogarden -g
Interfacing with the system
Currently, tools are accessible mainly through a command-line interface, which enables powerful customization and is especially useful for long-running bulk operations.
Development of more graphical and interactive tooling is currently ongoing.
Guides and resource pages
- Using the command-line interface
- Options reference
- Full list of supported engines
- Developer API reference
- Starting and interfacing with a WebSocket server
- Technical overview and Q&A
- Developer's task list
- How to help
This project consolidates, and builds upon the effort of many different individuals and companies, as well as contributing a number of original works.
Developed by Rotem Dan (IPA: /ˈʁɒːtem ˈdän/).
GNU General Public License v3
Licenses for components, models and other dependencies are detailed on this page.