18590918. Self-Training With Oracle And Top-Ranked Hypotheses simplified abstract (GOOGLE LLC)
Self-Training With Oracle And Top-Ranked Hypotheses
Organization Name
Inventor(s)
Andrew M. Rosenberg of Brooklyn NY (US)
Murali Karthick Baskar of Mountain View CA (US)
Bhuvana Ramabhadran of Mt. Kisco NY (US)
Self-Training With Oracle And Top-Ranked Hypotheses - A simplified explanation of the abstract
This abstract first appeared for US patent application 18590918 titled 'Self-Training With Oracle And Top-Ranked Hypotheses
The method described in the patent application involves using an RNN-T model to process sequences of acoustic frames for speech recognition training samples, resulting in n-best lists of hypotheses with corresponding word error rates relative to ground-truth transcriptions.
- The method calculates a first loss for the top-ranked hypothesis in the n-best list based on the ground-truth transcription.
- An oracle hypothesis is identified as the one with the fewest word errors in the n-best list, and a second loss is determined for this hypothesis based on the ground-truth transcription.
- A self-training combined loss is calculated based on the first and second losses, and the model is trained using this combined loss.
Key Features and Innovation:
- Utilizes an RNN-T model for processing acoustic frames in speech recognition training.
- Introduces the concept of an oracle hypothesis with the lowest word error rate for training.
- Calculates a self-training combined loss to improve model training accuracy.
Potential Applications:
- Speech recognition systems
- Language translation applications
- Voice-controlled devices
Problems Solved:
- Improving speech recognition accuracy
- Enhancing training efficiency for RNN-T models
Benefits:
- Higher accuracy in speech recognition
- More efficient model training process
- Enhanced performance in language translation tasks
Commercial Applications:
- Optimizing speech recognition software for improved user experience
- Developing advanced language translation tools for commercial use
Prior Art: Prior research in RNN-T models for speech recognition and language processing.
Frequently Updated Research: Ongoing studies on improving speech recognition systems using neural network models.
Questions about the Technology: 1. How does the use of an oracle hypothesis improve model training in speech recognition? 2. What are the potential implications of the self-training combined loss on speech recognition accuracy?
Original Abstract Submitted
A method includes, for each training sample of a plurality of training samples, processing, using an RNN-T model, a corresponding sequence of acoustic frames to obtain an n-best list of speech recognition hypotheses, and, for each speech recognition hypothesis of the n-best list, determining a corresponding number of word errors relative to a corresponding ground-truth transcription. For a top-ranked hypothesis from the n-best list, the method includes determining a first loss based on the corresponding ground-truth transcription. The method includes identifying, as an oracle hypothesis, the speech recognition hypothesis from the n-best list having the smallest corresponding number of word errors relative to the corresponding ground-truth transcription, and determining a second loss for the oracle hypothesis based on the corresponding ground-truth transcription. The method includes determining a corresponding self-training combined loss based on the first and second losses, and training the model based on the corresponding self-training combined loss.