A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services. Inspired by py3-TTS-Wrapper, it simplifies the use of services like Azure, Google Cloud, IBM Watson, and ElevenLabs.
- Features
- Supported TTS Engines
- Installation
- Quick Start
- Core Functionality
- SSML Support
- Speech Markdown Support
- Engine-Specific Examples
- Browser Support
- API Reference
- Contributing
- License
- Unified API: Consistent interface across multiple TTS providers.
- SSML Support: Use Speech Synthesis Markup Language to enhance speech synthesis
- Speech Markdown: Optional support for easier speech markup
- Voice Selection: Easily browse and select from available voices
- Streaming Synthesis: Stream audio as it's being synthesized
- Playback Control: Pause, resume, and stop audio playback
- Word Boundaries: Get callbacks for word timing (where supported)
- File Output: Save synthesized speech to audio files
- Browser Support: Works in both Node.js (server) and browser environments (see engine support table below)
Engine | Provider | Dependencies |
---|---|---|
Azure | Microsoft Azure Cognitive Services |
@azure/cognitiveservices-speechservices , microsoft-cognitiveservices-speech-sdk
|
Google Cloud | Google Cloud Text-to-Speech | @google-cloud/text-to-speech |
ElevenLabs | ElevenLabs |
node-fetch@2 (Node.js only) |
IBM Watson | IBM Watson | None (uses fetch API) |
OpenAI | OpenAI | openai |
PlayHT | PlayHT |
node-fetch@2 (Node.js only) |
AWS Polly | Amazon Web Services | @aws-sdk/client-polly |
SherpaOnnx | k2-fsa/sherpa-onnx |
sherpa-onnx-node , decompress , decompress-bzip2 , decompress-tarbz2 , decompress-targz , tar-stream
|
eSpeak NG | eSpeak NG | None (WASM included) |
WitAI | Wit.ai | None (uses fetch API) |
The library uses a modular approach where TTS engine-specific dependencies are optional. You can install dependencies in two ways:
Install the package with specific engine dependencies using the bracket notation (similar to pip extras):
# Install with specific engine dependencies
npm install js-tts-wrapper[azure] # Install with Azure dependencies
npm install js-tts-wrapper[google] # Install with Google Cloud dependencies
npm install js-tts-wrapper[polly] # Install with AWS Polly dependencies
npm install js-tts-wrapper[elevenlabs] # Install with ElevenLabs dependencies
npm install js-tts-wrapper[openai] # Install with OpenAI dependencies
npm install js-tts-wrapper[playht] # Install with PlayHT dependencies
npm install js-tts-wrapper[watson] # Install with IBM Watson dependencies
npm install js-tts-wrapper[witai] # Install with Wit.ai dependencies
npm install js-tts-wrapper[sherpaonnx] # Install with SherpaOnnx dependencies
# Install with multiple engine dependencies
npm install js-tts-wrapper[azure,google,polly]
# Install with all cloud-based engines
npm install js-tts-wrapper[cloud]
# Install with all engines
npm install js-tts-wrapper[all]
Alternatively, you can install the package and its dependencies manually:
# Install the base package
npm install js-tts-wrapper
# Install dependencies for specific engines
npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk # For Azure
npm install @google-cloud/text-to-speech # For Google Cloud
npm install @aws-sdk/client-polly # For AWS Polly
npm install node-fetch@2 # For ElevenLabs and PlayHT
npm install openai # For OpenAI
import { AzureTTSClient } from 'js-tts-wrapper';
// Initialize the client with your credentials
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// List available voices
const voices = await tts.getVoices();
console.log(voices);
// Set a voice
tts.setVoice('en-US-AriaNeural');
// Speak some text
await tts.speak('Hello, world!');
// Use SSML for more control
const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
await tts.speak(ssml);
const { AzureTTSClient } = require('js-tts-wrapper');
// Initialize the client with your credentials
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// Use async/await within an async function
async function runExample() {
// List available voices
const voices = await tts.getVoices();
console.log(voices);
// Set a voice
tts.setVoice('en-US-AriaNeural');
// Speak some text
await tts.speak('Hello, world!');
// Use SSML for more control
const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
await tts.speak(ssml);
}
runExample().catch(console.error);
The library provides a factory function to create TTS clients dynamically based on the engine name:
import { createTTSClient } from 'js-tts-wrapper';
// Create a TTS client using the factory function
const tts = createTTSClient('azure', {
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// Use the client as normal
await tts.speak('Hello from the factory pattern!');
const { createTTSClient } = require('js-tts-wrapper');
// Create a TTS client using the factory function
const tts = createTTSClient('azure', {
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
async function runExample() {
// Use the client as normal
await tts.speak('Hello from the factory pattern!');
}
runExample().catch(console.error);
The factory supports all engines: 'azure'
, 'google'
, 'polly'
, 'elevenlabs'
, 'openai'
, 'playht'
, 'watson'
, 'witai'
, etc.
All TTS engines in js-tts-wrapper implement a common set of methods and features through the AbstractTTSClient class. This ensures consistent behavior across different providers.
// Get all available voices
const voices = await tts.getVoices();
// Get voices for a specific language
const englishVoices = await tts.getVoicesByLanguage('en-US');
// Set the voice to use
tts.setVoice('en-US-AriaNeural');
The library includes a robust Language Normalization system that standardizes language codes across different TTS engines. This allows you to:
- Use BCP-47 codes (e.g., 'en-US') or ISO 639-3 codes (e.g., 'eng') interchangeably
- Get consistent language information regardless of the TTS engine
- Filter voices by language using any standard format
// Convert text to audio bytes (Uint8Array)
const audioBytes = await tts.synthToBytes('Hello, world!');
// Stream synthesis with word boundary information
const { audioStream, wordBoundaries } = await tts.synthToBytestream('Hello, world!');
// Synthesize and play audio
await tts.speak('Hello, world!');
// Playback control
tts.pause(); // Pause playback
tts.resume(); // Resume playback
tts.stop(); // Stop playback
// Stream synthesis and play with word boundary callbacks
await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => {
console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`);
});
Note: Audio playback with
speak()
andspeakStreamed()
methods is supported in both browser environments and Node.js environments with the optionalsound-play
package installed. To enable Node.js audio playback, install the package withnpm install js-tts-wrapper[node-audio]
.
// Save synthesized speech to a file
await tts.synthToFile('Hello, world!', 'output', 'mp3');
// Register event handlers
tts.on('start', () => console.log('Speech started'));
tts.on('end', () => console.log('Speech ended'));
tts.on('boundary', (word, start, end) => {
console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`);
});
// Alternative event connection
tts.connect('onStart', () => console.log('Speech started'));
tts.connect('onEnd', () => console.log('Speech ended'));
All engines support SSML (Speech Synthesis Markup Language) for advanced control over speech synthesis:
// Use SSML directly
const ssml = `
<speak>
<prosody rate="slow" pitch="low">
This text will be spoken slowly with a low pitch.
</prosody>
<break time="500ms"/>
<emphasis level="strong">This text is emphasized.</emphasis>
</speak>
`;
await tts.speak(ssml);
// Or use the SSML builder
const ssmlText = tts.ssml
.prosody({ rate: 'slow', pitch: 'low' }, 'This text will be spoken slowly with a low pitch.')
.break(500)
.emphasis('strong', 'This text is emphasized.')
.toString();
await tts.speak(ssmlText);
The library supports Speech Markdown for easier speech formatting:
// Use Speech Markdown
const markdown = "Hello (pause:500ms) world! This is (emphasis:strong) important.";
await tts.speak(markdown, { useSpeechMarkdown: true });
Each TTS engine has its own specific setup. Here are examples for each supported engine in both ESM and CommonJS formats:
import { AzureTTSClient } from 'js-tts-wrapper';
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
await tts.speak('Hello from Azure!');
const { AzureTTSClient } = require('js-tts-wrapper');
const tts = new AzureTTSClient({
subscriptionKey: 'your-subscription-key',
region: 'westeurope'
});
// Inside an async function
await tts.speak('Hello from Azure!');
import { GoogleTTSClient } from 'js-tts-wrapper';
const tts = new GoogleTTSClient({
keyFilename: '/path/to/service-account-key.json'
});
await tts.speak('Hello from Google Cloud!');
const { GoogleTTSClient } = require('js-tts-wrapper');
const tts = new GoogleTTSClient({
keyFilename: '/path/to/service-account-key.json'
});
// Inside an async function
await tts.speak('Hello from Google Cloud!');
import { PollyTTSClient } from 'js-tts-wrapper';
const tts = new PollyTTSClient({
region: 'us-east-1',
accessKeyId: 'your-access-key-id',
secretAccessKey: 'your-secret-access-key'
});
await tts.speak('Hello from AWS Polly!');
const { PollyTTSClient } = require('js-tts-wrapper');
const tts = new PollyTTSClient({
region: 'us-east-1',
accessKeyId: 'your-access-key-id',
secretAccessKey: 'your-secret-access-key'
});
// Inside an async function
await tts.speak('Hello from AWS Polly!');
import { ElevenLabsTTSClient } from 'js-tts-wrapper';
const tts = new ElevenLabsTTSClient({
apiKey: 'your-api-key'
});
await tts.speak('Hello from ElevenLabs!');
const { ElevenLabsTTSClient } = require('js-tts-wrapper');
const tts = new ElevenLabsTTSClient({
apiKey: 'your-api-key'
});
// Inside an async function
await tts.speak('Hello from ElevenLabs!');
import { OpenAITTSClient } from 'js-tts-wrapper';
const tts = new OpenAITTSClient({
apiKey: 'your-api-key'
});
await tts.speak('Hello from OpenAI!');
const { OpenAITTSClient } = require('js-tts-wrapper');
const tts = new OpenAITTSClient({
apiKey: 'your-api-key'
});
// Inside an async function
await tts.speak('Hello from OpenAI!');
import { PlayHTTTSClient } from 'js-tts-wrapper';
const tts = new PlayHTTTSClient({
apiKey: 'your-api-key',
userId: 'your-user-id'
});
await tts.speak('Hello from PlayHT!');
const { PlayHTTTSClient } = require('js-tts-wrapper');
const tts = new PlayHTTTSClient({
apiKey: 'your-api-key',
userId: 'your-user-id'
});
// Inside an async function
await tts.speak('Hello from PlayHT!');
import { WatsonTTSClient } from 'js-tts-wrapper';
const tts = new WatsonTTSClient({
apiKey: 'your-api-key',
region: 'us-south',
instanceId: 'your-instance-id'
});
await tts.speak('Hello from IBM Watson!');
const { WatsonTTSClient } = require('js-tts-wrapper');
const tts = new WatsonTTSClient({
apiKey: 'your-api-key',
region: 'us-south',
instanceId: 'your-instance-id'
});
// Inside an async function
await tts.speak('Hello from IBM Watson!');
import { WitAITTSClient } from 'js-tts-wrapper';
const tts = new WitAITTSClient({
token: 'your-wit-ai-token'
});
await tts.speak('Hello from Wit.ai!');
const { WitAITTSClient } = require('js-tts-wrapper');
const tts = new WitAITTSClient({
token: 'your-wit-ai-token'
});
// Inside an async function
await tts.speak('Hello from Wit.ai!');
import { SherpaOnnxTTSClient } from 'js-tts-wrapper';
const tts = new SherpaOnnxTTSClient();
// The client will automatically download models when needed
await tts.speak('Hello from SherpaOnnx!');
const { SherpaOnnxTTSClient } = require('js-tts-wrapper');
const tts = new SherpaOnnxTTSClient();
// The client will automatically download models when needed
// Inside an async function
await tts.speak('Hello from SherpaOnnx!');
Note: SherpaOnnx is a server-side only engine and requires specific environment setup. See the SherpaOnnx documentation for details on setup and configuration. For browser environments, use SherpaOnnx-WASM instead.
import { EspeakTTSClient } from 'js-tts-wrapper';
const tts = new EspeakTTSClient();
await tts.speak('Hello from eSpeak NG!');
const { EspeakTTSClient } = require('js-tts-wrapper');
const tts = new EspeakTTSClient();
// Inside an async function
await tts.speak('Hello from eSpeak NG!');
Function | Description | Return Type |
---|---|---|
createTTSClient(engine, credentials) |
Create a TTS client for the specified engine | AbstractTTSClient |
Method | Description | Return Type |
---|---|---|
getVoices() |
Get all available voices | Promise<UnifiedVoice[]> |
getVoicesByLanguage(language) |
Get voices for a specific language | Promise<UnifiedVoice[]> |
setVoice(voiceId, lang?) |
Set the voice to use | void |
synthToBytes(text, options?) |
Convert text to audio bytes | Promise<Uint8Array> |
synthToBytestream(text, options?) |
Stream synthesis with word boundaries | Promise<{audioStream, wordBoundaries}> |
speak(text, options?) |
Synthesize and play audio | Promise<void> |
speakStreamed(text, options?) |
Stream synthesis and play | Promise<void> |
synthToFile(text, filename, format?, options?) |
Save synthesized speech to a file | Promise<void> |
startPlaybackWithCallbacks(text, callback, options?) |
Play with word boundary callbacks | Promise<void> |
pause() |
Pause audio playback | void |
resume() |
Resume audio playback | void |
stop() |
Stop audio playback | void |
on(event, callback) |
Register event handler | void |
connect(event, callback) |
Connect to event | void |
checkCredentials() |
Check if credentials are valid | Promise<boolean> |
checkCredentialsDetailed() |
Check if credentials are valid with detailed response | Promise<CredentialsCheckResult> |
getProperty(propertyName) |
Get a property value | PropertyType |
setProperty(propertyName, value) |
Set a property value | void |
The checkCredentialsDetailed()
method returns a CredentialsCheckResult
object with the following properties:
{
success: boolean; // Whether the credentials are valid
error?: string; // Error message if credentials are invalid
voiceCount?: number; // Number of voices available if credentials are valid
}
The ssml
property provides a builder for creating SSML:
Method | Description |
---|---|
prosody(attrs, text) |
Add prosody element |
break(time) |
Add break element |
emphasis(level, text) |
Add emphasis element |
sayAs(interpretAs, text) |
Add say-as element |
phoneme(alphabet, ph, text) |
Add phoneme element |
sub(alias, text) |
Add substitution element |
toString() |
Convert to SSML string |
The library works in both Node.js and browser environments. In browsers, use the ESM or UMD bundle:
<!-- Using ES modules (recommended) -->
<script type="module">
import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/browser';
// Create a new SherpaOnnx WebAssembly TTS client
const ttsClient = new SherpaOnnxWasmTTSClient();
// Initialize the WebAssembly module
await ttsClient.initializeWasm('./sherpaonnx-wasm/sherpaonnx.js');
// Get available voices
const voices = await ttsClient.getVoices();
console.log(`Found ${voices.length} voices`);
// Set the voice
await ttsClient.setVoice(voices[0].id);
// Speak some text
await ttsClient.speak('Hello, world!');
</script>
Contributions are welcome! Please feel free to submit a Pull Request.
The library uses a peer dependencies approach to minimize the installation footprint. You can install only the dependencies you need for the engines you plan to use.
# Install the base package
npm install js-tts-wrapper
# Install dependencies for specific engines
npm install js-tts-wrapper[azure] # For Azure TTS
npm install js-tts-wrapper[google] # For Google TTS
npm install js-tts-wrapper[polly] # For AWS Polly
npm install js-tts-wrapper[openai] # For OpenAI TTS
npm install js-tts-wrapper[sherpaonnx] # For SherpaOnnx TTS
# Install dependencies for Node.js audio playback
npm install js-tts-wrapper[node-audio] # For audio playback in Node.js
# Install dependencies for cloud engines
npm install js-tts-wrapper[cloud] # For Azure, Google, Polly, and OpenAI
# Install all dependencies
npm install js-tts-wrapper[all] # For all engines and features
The library supports audio playback in Node.js environments with the optional sound-play
package. This allows you to use the speak()
and speakStreamed()
methods in Node.js applications, just like in browser environments.
To enable Node.js audio playback:
-
Install the required dependency:
npm install js-tts-wrapper[node-audio]
-
Use the TTS client as usual:
import { TTSFactory } from 'js-tts-wrapper'; const tts = TTSFactory.createTTSClient('mock'); // Play audio in Node.js await tts.speak('Hello, world!');
If the sound-play
package is not installed, the library will fall back to providing informative messages and suggest installing the package.
This project is licensed under the MIT License - see the LICENSE file for details.