Completely Automated Alignment and Vowel Extraction
Our automated system takes uploaded audio files and returns ASR transcriptions, alignments, and vowel formant measurements.
It is recommended that you look through the discussion on the completely automated system's functionality and limitations before you begin.
Bed Word: Automated Interview Transcription via Deepgram
NEW! Bed Word is a new tool that utilizes state-of-the-art ASR models to get computer-generated transcriptions of audio files. We are currently using a third-party transcription service called Deepgram. You will need to create your own Deepgram account and API Key and provide this information to Bed Word - your uploaded audio will be processed through Deepgram to get an initial transcription, then Bed Word will perform the final touches to make the transcriptions suitable for linguistic analysis. Your Deepgram account will be charged based on their pricing, which right now is $0.75 per audio hour. (DARLA is a completely free service, but Bed Word uses a third-party company called Deepgram that has its own fee. We do not make any money from this tool.)
Audio with transcriptions provided by our in-house speech recognition
This is our legacy speaker recognition system, which lacks the accuracy of more recent ASR systems. For the best results using automatic transcription, we recommend using Bed Word. This system uses ASR built upon the CMU Sphinx framework to transcribe your data and then runs it through automated alignment and extraction using Montreal Forced Aligner and FAVE-Extract. It also provides the facility to edit the transcripts produced by the speech recognizer, and rerun the analysis.
-
ASR evaluation
Automated data analysis requires a higher tolerance of potential noise in the alignment and formant extraction results. You can estimate this noise using our transcription evaluation tool, which takes a manual transcription of your recording along with the ASR transcription of the same, and uses weighted Levenshtein distance to compute error rates for words, phonemes, and stressed vowels.