Speech Recognition Testing

This project tests speech recognition . It takes sample audio and expected transcriptions, and tests whether or not there is proper transcription of the audio file in real time.

Using the project

This project requires Microsoft speech service, audio files and a corresponding transcriptions.txt file.

Values needed to run test harness.

- AUDIO_FOLDER_PATH
- TRANSCRIPTION_FILE_PATH
- SPEECH_SUBSCRIPTION_KEY OR 
    - CUSTOM_SPEECH_SUBSCRIPTION_KEY (You'll need to supply SPEECH_ENDPOINT_ID) 
- SERVICE_REGION (e.g: westus,westus2)

Optional.

    - SPEECH_ENDPOINT_ID (necesarry if using CUSTOM_SPEECH)  
    - FAILED_TESTS_JSON_LOCATION

examine: .env.sample

if you have all the above, you can proceed to...

npm install

Finally, run the test harness:

node lib/cli.js : runs the cli.

Maximum Calls for the service is currently at 20 calls.

Flags needed to run CLI.

flag	alias	value
-s	subscription-key	Microsoft Speech Subscription Key
-r	service-region	Speech Service Region
-d	audio-directory	Path to Directory of wav files
-e	endpoint-Id	Custom Speech Endpoint ID
-t	transcription-file	Transcription File Path, `.txt` file
-f	audio-file	singular audio file `.wav` for console logging Speech Transcription -t, -d
-o	out-file [ optional ]	test output file: saves JSON Array [ defaults to `./test_results.json` ]
-c	concurrent-calls	concurrent service calls[defaults to 1]
Conflicts --> -f : (-d & -t)		Providing a singular file to transcribe, results in console log of transcription from service

Note if Transcription Text File is edited in VSCODE, VSCODE Adds a New Line to end of all Files, This will affect how tests are ran. Turn off that feature before saving.

Creating your Audio Data/Files

First, we must create the audio files that we wish to test, along with their expected transcriptions. Audio must be .wav files sampled at 16kHz. My recommended approach for generating test audio is using Audacity to record wav files and to down sample them to 16kHz.

Using audacity, to create audio files for transcription.

Install Audacity (free and cross-platform)
Set your recording settings
- Correct mic selected (in toolbar near top)
- Mono selected (in toolbar near top)
- Project rate is 44100 Hz (bottom left) default, Speech Service Accepts 16000 Hz, Change this setting to match.
Record all the samples
- Hit record and speak all of your samples, back-to-back, with 1 - 1.5 sec of silence in between (try to be consistent). When finished, press stop
- Try not to pause for more than 1 sec within a single sample (such as for commas or periods)
- If you mess up: stop recording, select the messed up part to the end, and hit backspace
- If you need a break: stop the recording, select the silence at the end, and hit backspace
- When you're ready to resume, make sure cursor is at the end and press record again
Trim silence from beginning and end, if needed
Down sample to 16000 Hz if left at 44100 Hz
- Select the track by clicking in the track box to the left of the track
- Tracks -> Resample -> 16000 Hz
- Change the project rate at the bottom to 16000 Hz
Split the track using labels
- Select the track again
- Analyze -> Silence Finder
- Adjust "minimum duration of silence" based on how much you paused between recordings (I used 0.9s)
- Adjust "label placement" based on how much silence you want before each recording (I used 0.4s)
- This will create a label track with labels between each recording. Scroll through and make sure you don't have extra labels in the middle of recordings
- Select the S on each individual label and rename, select right arrow on each label: >, drag right to mark end of phrase/labeling.
- Ensure there's a label at the very beginning. If not, move to the beginning of the track ("skip to home" in top toolbar), Edit -> Labels -> Add Label at Selection (or Ctrl+B)
Before exporting: Edit -> Preferences -> Import / Export -> uncheck "Show Metadata Tags editor before export"
Export to multiple WAV files
- File -> Export -> Export Multiple...
- Choose a folder to export to:
  - location/of/audiodata/folder
- Format: WAV signed 16-bit PCM
- Split files based on: Labels
- Choose to Export using Label/Track Name; tracks aren't numbered.
- Press Export
- Click OK

Creation of transcriptions.txt file.

As you create your audio files, keep track of the expected transcriptions in a text file called transcriptions.txt. The structure for .txt file is the same structure used for training a custom acoustic model. Each line of the transcription file should have the name of an audio file, followed by the corresponding transcription. The file name and transcription should be separated by a tab (\t).

Important: TXT files should be encoded as UTF-8 BOM and not contain any UTF-8 characters above U+00A1 in the Unicode characters table. Typically –, ‘, ‚, “ etc. This harness tries to address this by cleaning your data.

Running this test Harness

First Build

Compile the project:

npm run build

Run CLI with FLAGS.

Compile the project & Run the test harness:

node lib/main.js : runs the cli, don't forget your flags.

example:

node lib/main.js -s "<subscription key>" -r "westus" -e "<CRIS endpoint ID >" -d "<audio directory with wav files>" -t "<transcription.txt file path>"

Results stored in JSON format, tracking WER.

Testing stores test results in JSON format which is stored in ./test_results.json by default, can be changed with a flag

speech-recognition-testing

Speech Recognition Testing

Using the project

Values needed to run test harness.

Maximum Calls for the service is currently at 20 calls.

Flags needed to run CLI.

Note if Transcription Text File is edited in VSCODE, VSCODE Adds a New Line to end of all Files, This will affect how tests are ran. Turn off that feature before saving.

Creating your Audio Data/Files

Using audacity, to create audio files for transcription.

Creation of transcriptions.txt file.

Running this test Harness

First Build

Run CLI with FLAGS.

Results stored in JSON format, tracking WER.

Readme

Keywords

Package Sidebar

Install

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

speech-recognition-testing

Speech Recognition Testing

Using the project

Values needed to run test harness.

Maximum Calls for the service is currently at 20 calls.

Flags needed to run CLI.

Note if Transcription Text File is edited in VSCODE, VSCODE Adds a New Line to end of all Files, This will affect how tests are ran. Turn off that feature before saving.

Creating your Audio Data/Files

Using audacity, to create audio files for transcription.

Creation of transcriptions.txt file.

Running this test Harness

First Build

Run CLI with FLAGS.

Results stored in JSON format, tracking WER.

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads