serialize-stt-words
A module to serialize and deserialize words from STT in dpe format into arrays of each attribute.
This is as a workaround to firebase 1mb limit.
eg with euristics if mock8hours.json
is 8 hours and 9.6MB
This is the breakdown of file size for each attribute saved seperately.
58K paragraphEndTimes.json
59K paragraphStartTimes.json
93K speakersLit.json
637K textList.json
637K wordEndTimes.json
653K wordStartTimes.json
Well within the 1MB firebase document limit.
Setup
git clone git@github.com:pietrop/serialize-stt-words.git
cd serialize-stt-words
npm install
Usage
input transcript json
{
"words": [
{
"text": "Hello",
"start": 0,
"end": 0.88
},
....
],
"paragraphs": [
{
"speaker": "SPEAKER_B",
"start": 0,
"end": 1.24
},
...
]
}
Returns arrays of
npm install @pietrop/serialize-stt-words
Serialize
const { serializeTranscript } = require('@pietrop/serialize-stt-words');
const { wordStartTimes, wordEndTimes, textList, paragraphStartTimes, paragraphEndTimes, speakersLit } = serializeTranscript(transcript);
output example
{
"wordStartTimes": [
0,
0.9,
1.13,
...
],
"wordEndTimes": [
0.88,
1.12,
...
],
"textList": [
"Media",
"will",
...
],
"paragraphStartTimes": [
0,
1.25,
...
],
"paragraphEndTimes": [
1.24,
4,
...
],
"speakersLit": [
"SPEAKER_B",
"SPEAKER_A",
...
]
}
The idea being that you could save each separate in a db and recombine later.
Deserialize
const { deserializeTranscript } = require('@pietrop/serialize-stt-words');
const desRes = deserializeTranscript({ wordStartTimes, wordEndTimes, textList, paragraphStartTimes, paragraphEndTimes, speakersLit });
Documentation
There's a docs folder in this repository.
docs/notes contains dev draft notes on various aspects of the project. This would generally be converted either into ADRs or guides when ready.
Development env
- npm >
6.1.0
- Node 12
Node version is set in node version manager .nvmrc
nvm use
Tests
npm test
Deployment
npm run publish:public