js-tts-wrapper

A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services. Inspired by py3-TTS-Wrapper, it simplifies the use of services like Azure, Google Cloud, IBM Watson, and ElevenLabs.

Features
Supported TTS Engines
Installation
- Using Dependency Groups
- Manual Installation
Quick Start
Core Functionality
SSML Support
Speech Markdown Support
Engine-Specific Examples
Browser Support
API Reference
Contributing
License

Features

Unified API: Consistent interface across multiple TTS providers.
SSML Support: Use Speech Synthesis Markup Language to enhance speech synthesis
Speech Markdown: Optional support for easier speech markup
Voice Selection: Easily browse and select from available voices
Streaming Synthesis: Stream audio as it's being synthesized
Playback Control: Pause, resume, and stop audio playback
Word Boundaries: Get callbacks for word timing (where supported)
File Output: Save synthesized speech to audio files
Browser Support: Works in both Node.js (server) and browser environments (see engine support table below)

Supported TTS Engines

Engine	Provider	Dependencies
Azure	Microsoft Azure Cognitive Services	`@azure/cognitiveservices-speechservices`, `microsoft-cognitiveservices-speech-sdk`
Google Cloud	Google Cloud Text-to-Speech	`@google-cloud/text-to-speech`
ElevenLabs	ElevenLabs	`node-fetch@2` (Node.js only)
IBM Watson	IBM Watson	None (uses fetch API)
OpenAI	OpenAI	`openai`
PlayHT	PlayHT	`node-fetch@2` (Node.js only)
AWS Polly	Amazon Web Services	`@aws-sdk/client-polly`
SherpaOnnx	k2-fsa/sherpa-onnx	`sherpa-onnx-node`, `decompress`, `decompress-bzip2`, `decompress-tarbz2`, `decompress-targz`, `tar-stream`
eSpeak NG	eSpeak NG	None (WASM included)
WitAI	Wit.ai	None (uses fetch API)

Installation

The library uses a modular approach where TTS engine-specific dependencies are optional. You can install dependencies in two ways:

Using Dependency Groups (npm 8.3.0+)

Install the package with specific engine dependencies using the bracket notation (similar to pip extras):

# Install with specific engine dependencies
npm install js-tts-wrapper[azure]      # Install with Azure dependencies
npm install js-tts-wrapper[google]     # Install with Google Cloud dependencies
npm install js-tts-wrapper[polly]      # Install with AWS Polly dependencies
npm install js-tts-wrapper[elevenlabs] # Install with ElevenLabs dependencies
npm install js-tts-wrapper[openai]     # Install with OpenAI dependencies
npm install js-tts-wrapper[playht]     # Install with PlayHT dependencies
npm install js-tts-wrapper[watson]     # Install with IBM Watson dependencies
npm install js-tts-wrapper[witai]      # Install with Wit.ai dependencies
npm install js-tts-wrapper[sherpaonnx] # Install with SherpaOnnx dependencies

# Install with multiple engine dependencies
npm install js-tts-wrapper[azure,google,polly]

# Install with all cloud-based engines
npm install js-tts-wrapper[cloud]

# Install with all engines
npm install js-tts-wrapper[all]

Manual Installation

Alternatively, you can install the package and its dependencies manually:

# Install the base package
npm install js-tts-wrapper

# Install dependencies for specific engines
npm install @azure/cognitiveservices-speechservices microsoft-cognitiveservices-speech-sdk  # For Azure
npm install @google-cloud/text-to-speech  # For Google Cloud
npm install @aws-sdk/client-polly  # For AWS Polly
npm install node-fetch@2  # For ElevenLabs and PlayHT
npm install openai  # For OpenAI

Quick Start

Direct Instantiation

ESM (ECMAScript Modules)

import { AzureTTSClient } from 'js-tts-wrapper';

// Initialize the client with your credentials
const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// List available voices
const voices = await tts.getVoices();
console.log(voices);

// Set a voice
tts.setVoice('en-US-AriaNeural');

// Speak some text
await tts.speak('Hello, world!');

// Use SSML for more control
const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
await tts.speak(ssml);

CommonJS

const { AzureTTSClient } = require('js-tts-wrapper');

// Initialize the client with your credentials
const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// Use async/await within an async function
async function runExample() {
  // List available voices
  const voices = await tts.getVoices();
  console.log(voices);

  // Set a voice
  tts.setVoice('en-US-AriaNeural');

  // Speak some text
  await tts.speak('Hello, world!');

  // Use SSML for more control
  const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
  await tts.speak(ssml);
}

runExample().catch(console.error);

Using the Factory Pattern

The library provides a factory function to create TTS clients dynamically based on the engine name:

ESM (ECMAScript Modules)

import { createTTSClient } from 'js-tts-wrapper';

// Create a TTS client using the factory function
const tts = createTTSClient('azure', {
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// Use the client as normal
await tts.speak('Hello from the factory pattern!');

CommonJS

const { createTTSClient } = require('js-tts-wrapper');

// Create a TTS client using the factory function
const tts = createTTSClient('azure', {
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

async function runExample() {
  // Use the client as normal
  await tts.speak('Hello from the factory pattern!');
}

runExample().catch(console.error);

The factory supports all engines: 'azure', 'google', 'polly', 'elevenlabs', 'openai', 'playht', 'watson', 'witai', etc.

Core Functionality

All TTS engines in js-tts-wrapper implement a common set of methods and features through the AbstractTTSClient class. This ensures consistent behavior across different providers.

Voice Management

// Get all available voices
const voices = await tts.getVoices();

// Get voices for a specific language
const englishVoices = await tts.getVoicesByLanguage('en-US');

// Set the voice to use
tts.setVoice('en-US-AriaNeural');

The library includes a robust Language Normalization system that standardizes language codes across different TTS engines. This allows you to:

Use BCP-47 codes (e.g., 'en-US') or ISO 639-3 codes (e.g., 'eng') interchangeably
Get consistent language information regardless of the TTS engine
Filter voices by language using any standard format

Text Synthesis

// Convert text to audio bytes (Uint8Array)
const audioBytes = await tts.synthToBytes('Hello, world!');

// Stream synthesis with word boundary information
const { audioStream, wordBoundaries } = await tts.synthToBytestream('Hello, world!');

Audio Playback

// Synthesize and play audio
await tts.speak('Hello, world!');

// Playback control
tts.pause();  // Pause playback
tts.resume(); // Resume playback
tts.stop();   // Stop playback

// Stream synthesis and play with word boundary callbacks
await tts.startPlaybackWithCallbacks('Hello world', (word, start, end) => {
  console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`);
});

Note: Audio playback with speak() and speakStreamed() methods is supported in both browser environments and Node.js environments with the optional sound-play package installed. To enable Node.js audio playback, install the package with npm install js-tts-wrapper[node-audio].

File Output

// Save synthesized speech to a file
await tts.synthToFile('Hello, world!', 'output', 'mp3');

Event Handling

// Register event handlers
tts.on('start', () => console.log('Speech started'));
tts.on('end', () => console.log('Speech ended'));
tts.on('boundary', (word, start, end) => {
  console.log(`Word: ${word}, Start: ${start}s, End: ${end}s`);
});

// Alternative event connection
tts.connect('onStart', () => console.log('Speech started'));
tts.connect('onEnd', () => console.log('Speech ended'));

SSML Support

All engines support SSML (Speech Synthesis Markup Language) for advanced control over speech synthesis:

// Use SSML directly
const ssml = `
<speak>
  <prosody rate="slow" pitch="low">
    This text will be spoken slowly with a low pitch.
  </prosody>
  <break time="500ms"/>
  <emphasis level="strong">This text is emphasized.</emphasis>
</speak>
`;
await tts.speak(ssml);

// Or use the SSML builder
const ssmlText = tts.ssml
  .prosody({ rate: 'slow', pitch: 'low' }, 'This text will be spoken slowly with a low pitch.')
  .break(500)
  .emphasis('strong', 'This text is emphasized.')
  .toString();

await tts.speak(ssmlText);

Speech Markdown Support

The library supports Speech Markdown for easier speech formatting:

// Use Speech Markdown
const markdown = "Hello (pause:500ms) world! This is (emphasis:strong) important.";
await tts.speak(markdown, { useSpeechMarkdown: true });

Engine-Specific Examples

Each TTS engine has its own specific setup. Here are examples for each supported engine in both ESM and CommonJS formats:

Azure

ESM

import { AzureTTSClient } from 'js-tts-wrapper';

const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

await tts.speak('Hello from Azure!');

CommonJS

const { AzureTTSClient } = require('js-tts-wrapper');

const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// Inside an async function
await tts.speak('Hello from Azure!');

Google Cloud

ESM

import { GoogleTTSClient } from 'js-tts-wrapper';

const tts = new GoogleTTSClient({
  keyFilename: '/path/to/service-account-key.json'
});

await tts.speak('Hello from Google Cloud!');

CommonJS

const { GoogleTTSClient } = require('js-tts-wrapper');

const tts = new GoogleTTSClient({
  keyFilename: '/path/to/service-account-key.json'
});

// Inside an async function
await tts.speak('Hello from Google Cloud!');

AWS Polly

ESM

import { PollyTTSClient } from 'js-tts-wrapper';

const tts = new PollyTTSClient({
  region: 'us-east-1',
  accessKeyId: 'your-access-key-id',
  secretAccessKey: 'your-secret-access-key'
});

await tts.speak('Hello from AWS Polly!');

CommonJS

const { PollyTTSClient } = require('js-tts-wrapper');

const tts = new PollyTTSClient({
  region: 'us-east-1',
  accessKeyId: 'your-access-key-id',
  secretAccessKey: 'your-secret-access-key'
});

// Inside an async function
await tts.speak('Hello from AWS Polly!');

ElevenLabs

ESM

import { ElevenLabsTTSClient } from 'js-tts-wrapper';

const tts = new ElevenLabsTTSClient({
  apiKey: 'your-api-key'
});

await tts.speak('Hello from ElevenLabs!');

CommonJS

const { ElevenLabsTTSClient } = require('js-tts-wrapper');

const tts = new ElevenLabsTTSClient({
  apiKey: 'your-api-key'
});

// Inside an async function
await tts.speak('Hello from ElevenLabs!');

OpenAI

ESM

import { OpenAITTSClient } from 'js-tts-wrapper';

const tts = new OpenAITTSClient({
  apiKey: 'your-api-key'
});

await tts.speak('Hello from OpenAI!');

CommonJS

const { OpenAITTSClient } = require('js-tts-wrapper');

const tts = new OpenAITTSClient({
  apiKey: 'your-api-key'
});

// Inside an async function
await tts.speak('Hello from OpenAI!');

PlayHT

ESM

import { PlayHTTTSClient } from 'js-tts-wrapper';

const tts = new PlayHTTTSClient({
  apiKey: 'your-api-key',
  userId: 'your-user-id'
});

await tts.speak('Hello from PlayHT!');

CommonJS

const { PlayHTTTSClient } = require('js-tts-wrapper');

const tts = new PlayHTTTSClient({
  apiKey: 'your-api-key',
  userId: 'your-user-id'
});

// Inside an async function
await tts.speak('Hello from PlayHT!');

IBM Watson

ESM

import { WatsonTTSClient } from 'js-tts-wrapper';

const tts = new WatsonTTSClient({
  apiKey: 'your-api-key',
  region: 'us-south',
  instanceId: 'your-instance-id'
});

await tts.speak('Hello from IBM Watson!');

CommonJS

const { WatsonTTSClient } = require('js-tts-wrapper');

const tts = new WatsonTTSClient({
  apiKey: 'your-api-key',
  region: 'us-south',
  instanceId: 'your-instance-id'
});

// Inside an async function
await tts.speak('Hello from IBM Watson!');

Wit.ai

ESM

import { WitAITTSClient } from 'js-tts-wrapper';

const tts = new WitAITTSClient({
  token: 'your-wit-ai-token'
});

await tts.speak('Hello from Wit.ai!');

CommonJS

const { WitAITTSClient } = require('js-tts-wrapper');

const tts = new WitAITTSClient({
  token: 'your-wit-ai-token'
});

// Inside an async function
await tts.speak('Hello from Wit.ai!');

SherpaOnnx (Offline TTS)

ESM

import { SherpaOnnxTTSClient } from 'js-tts-wrapper';

const tts = new SherpaOnnxTTSClient();
// The client will automatically download models when needed

await tts.speak('Hello from SherpaOnnx!');

CommonJS

const { SherpaOnnxTTSClient } = require('js-tts-wrapper');

const tts = new SherpaOnnxTTSClient();
// The client will automatically download models when needed

// Inside an async function
await tts.speak('Hello from SherpaOnnx!');

Note: SherpaOnnx is a server-side only engine and requires specific environment setup. See the SherpaOnnx documentation for details on setup and configuration. For browser environments, use SherpaOnnx-WASM instead.

eSpeak NG (WASM)

ESM

import { EspeakTTSClient } from 'js-tts-wrapper';

const tts = new EspeakTTSClient();

await tts.speak('Hello from eSpeak NG!');

CommonJS

const { EspeakTTSClient } = require('js-tts-wrapper');

const tts = new EspeakTTSClient();

// Inside an async function
await tts.speak('Hello from eSpeak NG!');

API Reference

Factory Function

Function	Description	Return Type
`createTTSClient(engine, credentials)`	Create a TTS client for the specified engine	`AbstractTTSClient`

Common Methods (All Engines)

Method	Description	Return Type
`getVoices()`	Get all available voices	`Promise<UnifiedVoice[]>`
`getVoicesByLanguage(language)`	Get voices for a specific language	`Promise<UnifiedVoice[]>`
`setVoice(voiceId, lang?)`	Set the voice to use	`void`
`synthToBytes(text, options?)`	Convert text to audio bytes	`Promise<Uint8Array>`
`synthToBytestream(text, options?)`	Stream synthesis with word boundaries	`Promise<{audioStream, wordBoundaries}>`
`speak(text, options?)`	Synthesize and play audio	`Promise<void>`
`speakStreamed(text, options?)`	Stream synthesis and play	`Promise<void>`
`synthToFile(text, filename, format?, options?)`	Save synthesized speech to a file	`Promise<void>`
`startPlaybackWithCallbacks(text, callback, options?)`	Play with word boundary callbacks	`Promise<void>`
`pause()`	Pause audio playback	`void`
`resume()`	Resume audio playback	`void`
`stop()`	Stop audio playback	`void`
`on(event, callback)`	Register event handler	`void`
`connect(event, callback)`	Connect to event	`void`
`checkCredentials()`	Check if credentials are valid	`Promise<boolean>`
`checkCredentialsDetailed()`	Check if credentials are valid with detailed response	`Promise<CredentialsCheckResult>`
`getProperty(propertyName)`	Get a property value	`PropertyType`
`setProperty(propertyName, value)`	Set a property value	`void`

The checkCredentialsDetailed() method returns a CredentialsCheckResult object with the following properties:

{
  success: boolean;    // Whether the credentials are valid
  error?: string;      // Error message if credentials are invalid
  voiceCount?: number; // Number of voices available if credentials are valid
}

SSML Builder Methods

The ssml property provides a builder for creating SSML:

Method	Description
`prosody(attrs, text)`	Add prosody element
`break(time)`	Add break element
`emphasis(level, text)`	Add emphasis element
`sayAs(interpretAs, text)`	Add say-as element
`phoneme(alphabet, ph, text)`	Add phoneme element
`sub(alias, text)`	Add substitution element
`toString()`	Convert to SSML string

Browser Support

The library works in both Node.js and browser environments. In browsers, use the ESM or UMD bundle:

<!-- Using ES modules (recommended) -->
<script type="module">
  import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper/browser';

  // Create a new SherpaOnnx WebAssembly TTS client
  const ttsClient = new SherpaOnnxWasmTTSClient();

  // Initialize the WebAssembly module
  await ttsClient.initializeWasm('./sherpaonnx-wasm/sherpaonnx.js');

  // Get available voices
  const voices = await ttsClient.getVoices();
  console.log(`Found ${voices.length} voices`);

  // Set the voice
  await ttsClient.setVoice(voices[0].id);

  // Speak some text
  await ttsClient.speak('Hello, world!');
</script>

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Optional Dependencies

The library uses a peer dependencies approach to minimize the installation footprint. You can install only the dependencies you need for the engines you plan to use.

# Install the base package
npm install js-tts-wrapper

# Install dependencies for specific engines
npm install js-tts-wrapper[azure]     # For Azure TTS
npm install js-tts-wrapper[google]    # For Google TTS
npm install js-tts-wrapper[polly]     # For AWS Polly
npm install js-tts-wrapper[openai]    # For OpenAI TTS
npm install js-tts-wrapper[sherpaonnx] # For SherpaOnnx TTS

# Install dependencies for Node.js audio playback
npm install js-tts-wrapper[node-audio] # For audio playback in Node.js

# Install dependencies for cloud engines
npm install js-tts-wrapper[cloud]     # For Azure, Google, Polly, and OpenAI

# Install all dependencies
npm install js-tts-wrapper[all]       # For all engines and features

Node.js Audio Playback

The library supports audio playback in Node.js environments with the optional sound-play package. This allows you to use the speak() and speakStreamed() methods in Node.js applications, just like in browser environments.

To enable Node.js audio playback:

Install the required dependency:
```
npm install js-tts-wrapper[node-audio]
```

Use the TTS client as usual:

import { TTSFactory } from 'js-tts-wrapper';

const tts = TTSFactory.createTTSClient('mock');

// Play audio in Node.js
await tts.speak('Hello, world!');

If the sound-play package is not installed, the library will fall back to providing informative messages and suggest installing the package.

License

This project is licensed under the MIT License - see the LICENSE file for details.

js-tts-wrapper

js-tts-wrapper

Table of Contents

Features

Supported TTS Engines

Installation

Using Dependency Groups (npm 8.3.0+)

Manual Installation

Quick Start

Direct Instantiation

ESM (ECMAScript Modules)

CommonJS

Using the Factory Pattern

ESM (ECMAScript Modules)

CommonJS

Core Functionality

Voice Management

Text Synthesis

Audio Playback

File Output

Event Handling

SSML Support

Speech Markdown Support

Engine-Specific Examples

Azure

ESM

CommonJS

Google Cloud

ESM

CommonJS

AWS Polly

ESM

CommonJS

ElevenLabs

ESM

CommonJS

OpenAI

ESM

CommonJS

PlayHT

ESM

CommonJS

IBM Watson

ESM

CommonJS

Wit.ai

ESM

CommonJS

SherpaOnnx (Offline TTS)

ESM

CommonJS

eSpeak NG (WASM)

ESM

CommonJS

API Reference

Factory Function

Common Methods (All Engines)

SSML Builder Methods

Browser Support

Contributing

Optional Dependencies

Node.js Audio Playback

License

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads