GOOGLE LLC (20240296837). MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING simplified abstract

From WikiPatents
Jump to navigation Jump to search

MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING

Organization Name

GOOGLE LLC

Inventor(s)

Andrew M. Rosenberg of Brooklyn NY (US)

Yosuke Higuchi of Mountain View CA (US)

Bhuvana Ramabhadran of Mt. Kisco NY (US)

MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240296837 titled 'MASK-CONFORMER AUGMENTING CONFORMER WITH MASK-PREDICT DECODER UNIFYING SPEECH RECOGNITION AND RESCORING

The abstract describes a method that involves processing a sequence of acoustic frames representing an utterance in two passes. In the first pass, audio encodings are generated using a stack of mask-conformer blocks, a transcription is created using a speech recognition decoder, and a masked output sequence is produced. In the second pass, cross-attention is performed on the acoustic frames and the first-pass transcription to generate new audio encodings and a revised transcription.

  • The method utilizes a stack of mask-conformer blocks in an acoustic encoder to process acoustic frames.
  • It involves generating audio encodings and transcriptions in two passes to improve accuracy.
  • Cross-attention is used to refine the audio encodings and transcription in the second pass.
  • The method combines speech recognition and acoustic processing techniques for enhanced performance.
  • By iteratively refining the audio encodings and transcriptions, the method aims to improve the overall accuracy of the system.

Potential Applications: - Speech recognition systems - Voice-controlled devices - Transcription services

Problems Solved: - Enhancing the accuracy of speech recognition systems - Improving the quality of transcriptions - Optimizing acoustic processing techniques

Benefits: - Higher accuracy in transcribing spoken language - Improved performance of voice-controlled devices - Enhanced user experience in speech recognition applications

Commercial Applications: Title: Advanced Speech Recognition Technology for Enhanced Transcription Services This technology can be applied in various industries such as: - Call centers for transcribing customer interactions - Legal firms for transcribing court proceedings - Healthcare for transcribing patient consultations

Questions about the technology: 1. How does the method improve the accuracy of speech recognition systems? 2. What are the key advantages of using a two-pass approach in processing acoustic frames?


Original Abstract Submitted

a method includes receiving a sequence of acoustic frames characterizing an utterance. during a first pass, the method includes generating first-pass audio encodings based on the sequence of acoustic frames using a stack of mask-conformer blocks of an acoustic encoder, generating a first-pass transcription of the utterance based on the first-pass audio encodings using a speech recognition decoder, and generating a first-pass masked output sequence using a mask-predict decoder of the acoustic encoder. during a second pass, the method includes generating second-pass audio encodings by performing cross-attention on the sequence of acoustic frames and the masked first-pass transcription using the stack of mask-conformer blocks of the acoustic encoder and generating a second-pass transcription of the utterance based on the second-pass audio encodings using the speech recognition decoder.