Diarization and recognition audio module using faster-whisper
and pyannote
modules.
Inspired by whisperX
and whisper-node
.
Since main part of the library is python, main goal is to make it as friendly as possible for Node.js users.
- Python 3.9 or greater
Unlike openai-whisper, FFmpeg does not need to be installed on the system. The audio is decoded with the Python library PyAV which bundles the FFmpeg libraries in its package.
(c) faster-whisper
-
First of all you need to install
Python
.a) For macOS and linux users is a best practice to create
virtual environment
.python -m venv ~/py_envs
Remember this path (~/py_envs), and install all packages in there. Exec this command before
pip
install.. ~/py_envs/bin/activate
b) For Windows users command is slightly different.
python -m venv C:\py_envs
C:\py_envs\Scripts\activate.bat
Path is C:\py_envs
-
Install
faster-whisper
module.pip install faster-whisper
-
Install
pyannote
module.a) Install
pyannote.audio
withpip install pyannote.audio
b) Accept
pyannote/segmentation-3.0
user conditionsc) Accept
pyannote/speaker-diarization-3.1
user conditionsd) Create access token at
hf.co/settings/tokens
.(c)
pyannote
-
Finally
npm i node-diarization
Ok, we are done! Good job.
import WhisperDiarization from 'node-diarization';
// only one required option is pyannote hf auth token
const options = {
diarization: {
pyannote_auth_token: 'YOUR_TOKEN'
}
};
(async () => {
// only works with .wav files
const wd = new WhisperDiarization('PATH/TO/WAV', options);
wd.init().then(r=> {
r.union?.map(s => {
console.log(`${s.speaker} [${s.info.start} - ${s.info.end}]: ${s.text}`);
});
// shell .py errors
console.log(r.errors);
}).catch(err => console.log(err));
// event listener for stream transcribition
// works only if both tasks provided (recognition and diarization)
wd.on('data', segments => {
segments.map(s => {
console.log(`${s.speaker} [${s.info.start} - ${s.info.end}]: ${s.text}`)
});
});
wd.on('end', _ => {
console.log('Done');
});
})();
const { WhisperDiarization } = require('node-diarization');
As result, you will get diarizied JSON output or error string.
{
"errors": "[]",
"rawDiarization": "[Array]",
"rawRecognition": "[Array]",
"result": [
{
"info": "[Object]",
"words": "[Array]",
"speaker": "SPEAKER_00",
"text": "..."
},
{
"info": "[Object]",
"words": "[Array]",
"speaker": "SPEAKER_01",
"text": "..."
}
]
}
You can create separate tasks, like recognition or diarization, but you need to add raw in options.
const options = {
recognition: {
raw: true,
},
tasks: ['recognition'],
}
For recognition models you have 3 choices:
- Add standard whisper model (tiny is default), Available standard whisper models is: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3, large-v3-turbo, turbo.
- Add path to directory, where your model.bin file located. This and above step checked by node.js
- Set option parameter
hfRepoModel
totrue
. This is the path to hf repo model. If repo is not existed, then trace error will be from python.
const options = {
python: {
// venv path i.e. "~/py_envs",
// https://docs.python.org/3/library/venv.html,
// default undefined
venvPath: '~/py_envs',
// python shell command, can be "python3", "py", etc.,
// default "python"
var: 'python',
},
diarization: {
// pyannote hf auth token,
// https://huggingface.co/settings/tokens,
// required if diarization task is set
pyannote_auth_token: 'YOU_TOKEN',
// return raw diarization object from py script,
// default false
raw: false,
// number of speakers, when known,
// default undefined
num_speakers: 1,
// minimum number of speakers,
// has no effect when `num_speakers` is provided,
// default undefined
min_speakers: 1,
// maximum number of speakers,
// has no effect when `num_speakers` is provided,
// default undefined
max_speakers: 1,
},
recognition: {
// return raw recognition object from py script,
// default false
raw: false,
// original Whisper model name,
// or path to model.bin, i.e. /path/to/models where model.bin is located,
// or namespace/repo_name for hf model
// default "tiny"
model: 'tiny',
// pass js check for standard whisper model name or pass to model.bin
// if repo is not existed, then it will be python error
// default false
hfRepoModel:false,
// recognition options
// default 5
beam_size: 5,
// recognition options
// default undefined
compute_type: 'float32',
},
checks: {
// default checks for python vars availability, also py scripts
// before run diarization and recognition
// default false
proceed: false,
//if proceed false, tasks ignored
recognition: true,
diarization: true,
},
// array of tasks,
// can be ["recognition"], ["diarization"] or both,
// if undefined, will set all tasks,
// default undefined,
tasks: [],
shell: {
// silent shell console output,
// default true
silent: true,
},
// information text in console, default true
consoleTextInfo: true,
}
Before start, you would like to check all requirements, so we have a static check
function. Also check options is default false
in main wd.
import WhisperDiarization from 'node-diarization';
// options.python (if different from default ones)
const options = {
venvPath: '~/py_envs',
var: 'python',
};
(async () => {
WhisperDiarization.check(options).catch(err => console.log(err));
})();
Also, you can enable check
in main function options.