Simplify your AI text-to-speech integration!
A powerful and straightforward Node.js module for generating speech audio from text using the OpenAI API (support for other TTS providers in the works). ai-text-to-speech offers a simple and robust interface to convert text into high-quality speech audio files in various formats and voices.
Developed by Jerry Kapron for everyone to use freely 👍🏼
☕️ Buy me a coffee
- ai-text-to-speech
- Easy Integration: Seamlessly integrate text-to-speech functionality into your Node.js applications.
- Multiple Voices: Choose from a variety of high-quality voices to suit your application's needs.
- Flexible Output Formats: Supports various audio formats like MP3, WAV, FLAC, and more.
- Customizable File Naming: Control the output file naming with suffix options to prevent overwrites.
- Robust Error Handling: Comprehensive validation and descriptive error messages for easy debugging.
Install ai-text-to-speech via NPM:
npm install ai-text-to-speech
// Use this import statement if your project supports ES Modules
import aiSpeech from 'ai-text-to-speech';
OR
// Use this require statement if your project uses CommonJS modules
const aiSpeech = require('ai-text-to-speech');
(async () => { // or nested inside another async function
try {
const audioFilePath = await aiSpeech({
input: 'Buy me a coffee if it works for you.'
// If the OPENAI_API_KEY environment variable is already set,
// you don't have to specify the api_key option
});
console.log(`Audio file saved at: ${audioFilePath}`);
} catch (error) {
console.error('Error generating speech audio:', error.message);
}
})();
// This approach can be useful if you prefer working with promises directly
// or if you're in an environment where async/await is not supported.
aiSpeech({
input: 'Buy me a coffee if it works for you.',
// You can explicitly provide your OpenAI API key here
api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
})
.then((audioFilePath) => {
console.log(`Audio file saved at: ${audioFilePath}`);
})
.catch((error) => {
console.error('Error generating speech audio:', error.message);
});
(async () => { // or nested inside another async function
try {
const audioFilePath = await aiSpeech({
input: 'Buy me a coffee if it works for you.',
dest_dir: './audio',
file_name: 'welcome-message',
voice: 'echo',
model: 'tts-1-hd',
response_format: 'wav',
suffix_type: 'nano',
api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
});
console.log(`Audio file saved at: ${audioFilePath}`);
} catch (error) {
console.error('Error generating speech audio:', error.message);
}
})();
// This approach can be useful if you prefer working with promises directly
// or if you're in an environment where async/await is not supported.
aiSpeech({
input: 'Buy me a coffee if it works for you.',
dest_dir: './audio',
file_name: 'welcome-message',
voice: 'echo',
model: 'tts-1-hd',
response_format: 'wav',
suffix_type: 'nano',
api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
})
.then((audioFilePath) => {
console.log(`Audio file saved at: ${audioFilePath}`);
})
.catch((error) => {
console.error('Error generating speech audio:', error.message);
});
The text to generate audio for. Maximum length is 4096 characters.
The destination directory to save the audio file. Default: './'
(current directory).
The base name of the output file. Default: 'speech-audio'
.
The voice to use for speech synthesis. Default: 'nova'
.
The TTS model to use. Default: 'tts-1'
.
The audio format for the output file. Default: 'mp3'
.
The type of unique suffix used in the file name. Default: 'uuid'
.
Your OpenAI API key. Default: The value of the OPENAI_API_KEY
environment variable.
alloy
echo
fable
onyx
-
nova
(default) shimmer
-
tts-1
(default) tts-1-hd
-
mp3
(default) opus
aac
flac
wav
pcm
-
uuid
(default): A unique UUID string. -
milli
: Timestamp in milliseconds. -
micro
: Timestamp in microseconds. -
nano
: Timestamp in nanoseconds. -
none
: No suffix. Warning: May overwrite existing files if filenames collide.
-
Input Length: The
input
text must not exceed 4096 characters. Exceeding this limit will result in an error. -
File Overwrite Risk: Using
suffix_type: 'none'
without specifying a uniquefile_name
may lead to overwriting existing files. -
Directory Permissions: Ensure the
dest_dir
exists and the application has write permissions. The module will throw an error if it cannot write to the directory. -
API Key Requirement: An OpenAI API key is required. Set it via the
api_key
option or theOPENAI_API_KEY
environment variable. - Network Errors: Network issues or incorrect API endpoints will result in errors. Ensure you have a stable internet connection.
-
Unsupported Values: Providing unsupported values for
voice
,model
,response_format
, orsuffix_type
will result in an error.
This project is licensed under the MIT License.
For more information on voice options, see the OpenAI Text-to-Speech Voice Options.
- Opus: Ideal for internet streaming and communication due to low latency.
- AAC: Preferred for digital audio compression; widely used on YouTube, Android, and iOS.
- FLAC: Suitable for lossless audio compression; favored by audio enthusiasts for archiving.
- WAV: Uncompressed audio, suitable for applications requiring minimal decoding overhead.
- PCM: Raw audio samples in 24kHz (16-bit signed, little-endian), without headers.
Note: Ensure compliance with OpenAI's usage policies when integrating this module into your applications.