Browser-based Speech Recognition & Speech Synthesis with support for multiple languages and voices.
npm install @swankylegg/voice-io
import { VoiceIO } from "@swankylegg/voice-io";
// Initialize
const voice = new VoiceIO();
// Listen for speech
voice.setState(voice.states.LISTENING);
// Handle speech recognition
voice.config.onRecognitionResult = (results, bestTranscript) => {
console.log("User said:", bestTranscript);
// Respond with speech
voice.setState(voice.states.RESPONDING, "I heard you!");
};
Try it out: Live Demo
This library uses the Web Speech API and requires browser support for:
-
SpeechRecognition
(orwebkitSpeechRecognition
) speechSynthesis
Voice I/O automatically detects available system voices for speech synthesis. Different operating systems and browsers provide different voices:
- macOS/iOS: High-quality system voices in multiple languages
- Windows: Microsoft system voices + installed language packs
- Android: System voices vary by device/manufacturer
- Chrome: Additional cloud-based voices when online
To get available voices:
voice.getVoices(); // Get all available voices
voice.getVoicesForCurrentLanguage(); // Get voices matching current language
To set a specific voice:
voice.setVoice("Samantha"); // Use a specific voice by name
Voice I/O supports multiple languages for both recognition and synthesis. Available languages include:
- English (en-US)
- Spanish (es-ES)
- French (fr-FR)
- German (de-DE)
- Italian (it-IT)
- Japanese (ja-JP)
- Korean (ko-KR)
- Chinese (zh-CN)
Note: Actual language availability depends on browser support for both recognition and synthesis in the selected language. The library will automatically filter voices to match the selected language.
Voice I/O has four possible states:
-
IDLE
: Default state, not listening or speaking🎤🚫 | 🔈🚫
-
LISTENING
: Actively listening for speech input🎤🟢 | 🔈🚫
-
THINKING
: Transitional state between listening and responding (use for loading states)🎤🚫 | 🔈🚫
-
RESPONDING
: Currently speaking using speech synthesis🎤🚫 | 🔊🟢
This project is licensed under the MIT License - see the LICENSE file for details.