Google llc (20240296840). Text Injection For Training Auxiliary Tasks In Speech Recognition Models simplified abstract

From WikiPatents
Jump to navigation Jump to search

Text Injection For Training Auxiliary Tasks In Speech Recognition Models

Organization Name

google llc

Inventor(s)

Shaan Jagdeep Patrick Bijwadia of San Francisco CA (US)

Shuo-yiin Chang of Sunnyvale CA (US)

Tara N. Sainath of Jersey City NJ (US)

Weiran Wang of San Jose CA (US)

Zhong Meng of Mountain View CA (US)

Text Injection For Training Auxiliary Tasks In Speech Recognition Models - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240296840 titled 'Text Injection For Training Auxiliary Tasks In Speech Recognition Models

The abstract describes a joint auxiliary task and automatic speech recognition (ASR) model that includes an encoder to process acoustic frames and generate higher-order feature representations, a multi-output hat decoder to generate speech recognition hypotheses and indicate auxiliary tokens, and a training process based on paired and unpaired data sets.

  • Encoder processes acoustic frames to generate feature representations
  • Multi-output hat decoder generates speech recognition hypotheses and auxiliary tokens
  • Training process involves paired audio data with ground-truth auxiliary tokens and unpaired textual utterances with auxiliary tokens
  • Model aims to improve ASR performance by incorporating auxiliary tasks
  • Innovative approach to training ASR models using both paired and unpaired data sets

Potential Applications: - Improving speech recognition accuracy - Enhancing performance of ASR systems in noisy environments - Facilitating the development of multi-task learning models in speech recognition

Problems Solved: - Addressing the challenge of improving ASR accuracy in various acoustic conditions - Integrating auxiliary tasks to enhance the performance of ASR models

Benefits: - Enhanced speech recognition accuracy - Improved robustness of ASR systems in challenging acoustic environments - Facilitation of multi-task learning in speech recognition research

Commercial Applications: Title: "Enhanced Speech Recognition Model for Noisy Environments" This technology could be utilized in: - Voice-controlled devices and virtual assistants - Call center automation systems - Transcription services for noisy environments

Questions about the technology: 1. How does the model handle auxiliary tasks in the training process? 2. What are the potential implications of incorporating auxiliary tokens in speech recognition models?

Frequently Updated Research: Stay updated on advancements in multi-task learning models in speech recognition to further enhance the performance of the technology.


Original Abstract Submitted

a joint auxiliary task and asr model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher-order feature representation for a corresponding acoustic frame. the model also includes a multi-output hat decoder to generate at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the output step corresponds to an auxiliary token associated with a particular auxiliary task. the model is trained by a jeit training process based on: a paired training data set including paired audio data and transcriptions, the transcriptions annotated with ground-truth auxiliary tokens associated with the particular auxiliary task; and an unpaired training data set including textual utterances not paired with any corresponding audio data, the textual utterances annotated with the ground-truth auxiliary tokens associated with the particular auxiliary task.