Google LLC (20240290321). CHUNK-WISE ATTENTION FOR LONGFORM ASR simplified abstract

From WikiPatents
Jump to navigation Jump to search

CHUNK-WISE ATTENTION FOR LONGFORM ASR

Organization Name

Google LLC

Inventor(s)

Yongqiang Wang of Kirkland WA (US)

Yu Zhang of Mountain View CA (US)

Wei Han of Mountain View CA (US)

Parisa Haghani of Mountain View CA (US)

Pedro J. Moreno Mengibar of Jersey City NJ (US)

CHUNK-WISE ATTENTION FOR LONGFORM ASR - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240290321 titled 'CHUNK-WISE ATTENTION FOR LONGFORM ASR

The method described in the abstract involves training data that includes multilingual unspoken textual utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances.

  • Generating target quantized vector tokens and target token indexes for un-transcribed non-synthetic speech utterances.
  • Generating contrastive context vectors from masked audio features and deriving a contrastive loss term.
  • Generating alignment outputs and probability distributions over possible speech recognition hypotheses.
  • Pre-training an audio encoder based on contrastive loss, alignment output loss, and non-synthetic speech loss terms.
      1. Potential Applications:

This technology could be applied in speech recognition systems, language translation tools, and voice-controlled devices.

      1. Problems Solved:

This technology addresses the challenges of processing multilingual unspoken textual utterances and non-synthetic speech data for training audio encoders.

      1. Benefits:

The method improves the accuracy and efficiency of speech recognition systems, enhances multilingual capabilities, and enables better performance in voice-controlled applications.

      1. Commercial Applications:

"Multilingual Speech Recognition and Training Method" could be utilized in smart speakers, virtual assistants, language learning platforms, and customer service automation tools.

      1. Prior Art:

Researchers can explore prior studies on multilingual speech recognition, audio encoder training, and contrastive loss methods in the field of natural language processing.

      1. Frequently Updated Research:

Stay updated on the latest advancements in multilingual speech recognition, audio encoder training techniques, and improvements in contrastive loss algorithms.

        1. Questions about Multilingual Speech Recognition and Training Method:

1. How does this method improve the accuracy of speech recognition systems? 2. What are the potential applications of this technology in the field of natural language processing?


Original Abstract Submitted

a method includes receiving training data including a corpus of multilingual unspoken textual utterances, a corpus of multilingual un-transcribed non-synthetic speech utterances, and a corpus of multilingual transcribed non-synthetic speech utterances. for each un-transcribed non-synthetic speech utterance, the method includes generating a target quantized vector token and a target token index, generating contrastive context vectors from corresponding masked audio features, and deriving a contrastive loss term. the method also includes generating an alignment output, generating a first probability distribution over possible speech recognition hypotheses for the alignment output, and determining an alignment output loss term. the method also includes generating a second probability distribution over possible speech recognition hypotheses and determining a non-synthetic speech loss term. the method also includes pre-training an audio encoder based on the contrastive loss term, the alignment output loss term, and the non-synthetic speech loss term.