GOOGLE LLC (20240296840). Text Injection For Training Auxiliary Tasks In Speech Recognition Models simplified abstract

From WikiPatents
Jump to navigation Jump to search

Text Injection For Training Auxiliary Tasks In Speech Recognition Models

Organization Name

GOOGLE LLC

Inventor(s)

Shaan Jagdeep Patrick Bijwadia of San Francisco CA (US)

Shuo-yiin Chang of Sunnyvale CA (US)

Tara N. Sainath of Jersey City NJ (US)

Weiran Wang of San Jose CA (US)

Zhong Meng of Mountain View CA (US)

Text Injection For Training Auxiliary Tasks In Speech Recognition Models - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240296840 titled 'Text Injection For Training Auxiliary Tasks In Speech Recognition Models

The abstract describes a joint auxiliary task and automatic speech recognition (ASR) model that includes an encoder to process acoustic frames and generate higher-order feature representations, a multi-output hat decoder to generate speech recognition hypotheses and indicate auxiliary tokens, and a training process involving paired and unpaired data sets annotated with auxiliary tokens.

  • Encoder processes acoustic frames to generate higher-order feature representations
  • Multi-output hat decoder generates speech recognition hypotheses and indicates auxiliary tokens
  • Training process involves paired audio data with ground-truth auxiliary tokens and unpaired textual utterances with auxiliary tokens
  • Model is designed for joint auxiliary task and ASR applications
  • Utilizes a jeit training process for model training

Potential Applications: - Speech recognition systems - Language translation applications - Voice-controlled devices

Problems Solved: - Improving accuracy and efficiency of speech recognition - Enhancing performance of auxiliary tasks in conjunction with ASR models

Benefits: - Enhanced speech recognition capabilities - Improved accuracy in recognizing auxiliary task-related tokens - Increased efficiency in processing audio data

Commercial Applications: Title: Advanced Speech Recognition and Auxiliary Task Model for Enhanced Performance This technology can be utilized in various industries such as: - Telecommunications - Customer service - Healthcare for dictation and transcription purposes

Questions about the technology: 1. How does the model handle unpaired textual utterances in the training process? The model uses unpaired textual utterances annotated with ground-truth auxiliary tokens to improve performance in recognizing auxiliary task-related tokens.

2. What are the key components of the encoder in processing acoustic frames? The encoder receives a sequence of acoustic frames and generates higher-order feature representations for each frame, enhancing the model's ability to recognize speech patterns.


Original Abstract Submitted

a joint auxiliary task and asr model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher-order feature representation for a corresponding acoustic frame. the model also includes a multi-output hat decoder to generate at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the output step corresponds to an auxiliary token associated with a particular auxiliary task. the model is trained by a jeit training process based on: a paired training data set including paired audio data and transcriptions, the transcriptions annotated with ground-truth auxiliary tokens associated with the particular auxiliary task; and an unpaired training data set including textual utterances not paired with any corresponding audio data, the textual utterances annotated with the ground-truth auxiliary tokens associated with the particular auxiliary task.