Self-Training With Oracle And Top-Ranked Hypotheses

Organization Name

Inventor(s)

Murali Karthick Baskar of Mountain View CA (US)

Bhuvana Ramabhadran of Mt. Kisco NY (US)

Self-Training With Oracle And Top-Ranked Hypotheses - A simplified explanation of the abstract

This abstract first appeared for US patent application 18590918 titled 'Self-Training With Oracle And Top-Ranked Hypotheses

The method described in the patent application involves using an RNN-T model to process sequences of acoustic frames for speech recognition training samples, resulting in n-best lists of hypotheses with corresponding word error rates relative to ground-truth transcriptions.

The method calculates a first loss for the top-ranked hypothesis in the n-best list based on the ground-truth transcription.
An oracle hypothesis is identified as the one with the fewest word errors in the n-best list, and a second loss is determined for this hypothesis based on the ground-truth transcription.
A self-training combined loss is calculated based on the first and second losses, and the model is trained using this combined loss.

Key Features and Innovation:

Utilizes an RNN-T model for processing acoustic frames in speech recognition training.
Introduces the concept of an oracle hypothesis with the lowest word error rate for training.
Calculates a self-training combined loss to improve model training accuracy.

Potential Applications:

Speech recognition systems
Language translation applications
Voice-controlled devices

Problems Solved:

Improving speech recognition accuracy
Enhancing training efficiency for RNN-T models

Benefits:

Higher accuracy in speech recognition
More efficient model training process
Enhanced performance in language translation tasks

Commercial Applications:

Optimizing speech recognition software for improved user experience
Developing advanced language translation tools for commercial use

Prior Art: Prior research in RNN-T models for speech recognition and language processing.

Frequently Updated Research: Ongoing studies on improving speech recognition systems using neural network models.

Questions about the Technology: 1. How does the use of an oracle hypothesis improve model training in speech recognition? 2. What are the potential implications of the self-training combined loss on speech recognition accuracy?

Original Abstract Submitted

A method includes, for each training sample of a plurality of training samples, processing, using an RNN-T model, a corresponding sequence of acoustic frames to obtain an n-best list of speech recognition hypotheses, and, for each speech recognition hypothesis of the n-best list, determining a corresponding number of word errors relative to a corresponding ground-truth transcription. For a top-ranked hypothesis from the n-best list, the method includes determining a first loss based on the corresponding ground-truth transcription. The method includes identifying, as an oracle hypothesis, the speech recognition hypothesis from the n-best list having the smallest corresponding number of word errors relative to the corresponding ground-truth transcription, and determining a second loss for the oracle hypothesis based on the corresponding ground-truth transcription. The method includes determining a corresponding self-training combined loss based on the first and second losses, and training the model based on the corresponding self-training combined loss.