Transcribe speech to text on node.js using OpenAI's Whisper models converted to cross-platform ONNX format
- Add dependency to project
npm i node-speech-recognition
import NSR from "node-speech-recognition";
const { default: Whisper } = NSR;
const whisper = new Whisper();
await whisper.init('base.en')
const transcribed = await whisper.transcribe('your/audio/path.wav');
console.log(transcribed)
[
{
text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country."
chunks: [
{ timestamp: [0, 8.18], text: " And so my fellow Americans ask not what your country can do for you" },
{ timestamp: [8.18, 11.06], text: " ask what you can do for your country." }
]
}
]
The Whisper
class has the following methods:
-
init(modelName: string)
: you must initialize it before trying to transcribe any audio.-
modelName
: name of the Whisper's models. Available ones are:| Model | Disk | |-----------|--------| | tiny | 235 MB | | tiny.en | 235 MB | | base | 400 MB | | base.en | 400 MB | | small | 1.1 GB | | small.en | 1.1 GB | | medium | 1.2 GB | | medium.en | 1.2 GB |
-
-
transcribe(filePath: string, language?: string)
: transcribes speech from wav file.-
filePath
: path to wav file -
language
: target language for recognition. Name format - the full name in English like'spanish'
-
-
disposeModel()
: dispose initialized model.