GOOGLE LLC (20240296837). MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING simplified abstract
MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING
Organization Name
Inventor(s)
Andrew M. Rosenberg of Brooklyn NY (US)
Yosuke Higuchi of Mountain View CA (US)
Bhuvana Ramabhadran of Mt. Kisco NY (US)
MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240296837 titled 'MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING
The abstract describes a method that involves processing a sequence of acoustic frames representing an utterance in two passes. In the first pass, audio encodings are generated using a stack of mask-conformer blocks, a transcription is created using a speech recognition decoder, and a masked output sequence is produced. In the second pass, cross-attention is performed on the acoustic frames and the first-pass transcription to generate new audio encodings and a revised transcription.
- The method utilizes a stack of mask-conformer blocks in an acoustic encoder to process acoustic frames.
- It involves generating audio encodings and transcriptions in two passes to improve accuracy.
- Cross-attention is used to refine the audio encodings and transcription in the second pass.
- The method combines speech recognition and acoustic processing techniques for enhanced performance.
- By iteratively refining the audio encodings and transcriptions, the method aims to improve the overall accuracy of the system.
Potential Applications: - Speech recognition systems - Voice-controlled devices - Transcription services
Problems Solved: - Enhancing the accuracy of speech recognition systems - Improving the quality of transcriptions - Optimizing acoustic processing techniques
Benefits: - Higher accuracy in transcribing spoken language - Improved performance of voice-controlled devices - Enhanced user experience in speech recognition applications
Commercial Applications: Title: Advanced Speech Recognition Technology for Enhanced Transcription Services This technology can be applied in various industries such as: - Call centers for transcribing customer interactions - Legal firms for transcribing court proceedings - Healthcare for transcribing patient consultations
Questions about the technology: 1. How does the method improve the accuracy of speech recognition systems? 2. What are the key advantages of using a two-pass approach in processing acoustic frames?
Original Abstract Submitted
a method includes receiving a sequence of acoustic frames characterizing an utterance. during a first pass, the method includes generating first-pass audio encodings based on the sequence of acoustic frames using a stack of mask-conformer blocks of an acoustic encoder, generating a first-pass transcription of the utterance based on the first-pass audio encodings using a speech recognition decoder, and generating a first-pass masked output sequence using a mask-predict decoder of the acoustic encoder. during a second pass, the method includes generating second-pass audio encodings by performing cross-attention on the sequence of acoustic frames and the masked first-pass transcription using the stack of mask-conformer blocks of the acoustic encoder and generating a second-pass transcription of the utterance based on the second-pass audio encodings using the speech recognition decoder.