Description
This package allows real-time speech-to-text (STT) functionality to be performed on audio streams. It offers numerous strategies for how the STT can be performed including the following pipelines: the opensource DeepSpeech architecture, the Amazon Transcribe API and Google Speech-to-text API.
Currently, audio can be passed in as a stream of Buffer objects containing audio data encoded using one of the following
Installation
yarn add @bottlenose/rxtranscribe
npm i --save @bottlenose/rxtranscribe
DeepSpeech
To run the DeepSpeech pipeline, download the DeepSpeech model, unzip it and pass the model directory to the toDeepSpeech
operator like this: toDeepSpeech({modelDir: 'path/to/deepseech-models-0.7.0'})
.
AWS Transcribe
To run the AWS Transcribe pipeline, you'll need a valid ACCESS_KEY_ID and SECRET_ACCESS_KEY with permissions to run AWS Transcribe.
GCP Speech-to-text
- To run the GCP speech-to-text pipeline, you'll need a valid JSON file containing GCP credentials.
- The project will need to have the speech-to-text API enabled.
- You may need to set GOOGLE_APPLICATION_CREDENTIALS environment variable so that it contains the path of your credentials file.
yarn install
would handle all the dependencies.
Compatibility
Platform | Support |
---|---|
node.js (>12) | |
Browsers | |
React Native | |
Electron |
Basic Usage
import {map} from 'rxjs/operators';
import {toDeepSpeech} from '@bottlenose/rxtranscribe';
// The pipeline takes a stream of .wav audio chunks (Buffer, String, Blob or Typed Array)
const buffer$ = pcmChunkEncodedAs16BitIntegers$.pipe(
map(chunk => Buffer.from(chunk, 'base64')),
toDeepSpeech({modelDir: '/path/to/deepspeech-models-0.7.0'})
);