18515212. Monte Carlo Self-Training for Speech Recognition simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Monte Carlo Self-Training for Speech Recognition

Organization Name

GOOGLE LLC

Inventor(s)

Anshuman Tripathi of Mountain View CA (US)

Soheil Khorram of Redwood City CA (US)

Hasim Sak of Santa Clara CA (US)

Han Lu of Redwood WA (US)

Jaeyoung Kim of Cupertino CA (US)

Qian Zhang of Mountain View CA (US)

Monte Carlo Self-Training for Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 18515212 titled 'Monte Carlo Self-Training for Speech Recognition

Simplified Explanation

The method described in the abstract is for training a sequence transduction model using an unsupervised subnetwork with teacher and student branches. The teacher branch processes input features to predict probability distributions over possible output labels, while the student branch processes input features to predict probability distributions over possible output labels and updates its parameters based on the negative log likelihood term.

  • Sequence transduction model training method:
   * Utilizes unsupervised subnetwork with teacher and student branches
   * Teacher branch predicts probability distributions over output labels
   * Student branch predicts probability distributions over output labels and updates parameters based on negative log likelihood term
    • Potential Applications:**

- Natural language processing - Speech recognition - Machine translation

    • Problems Solved:**

- Training sequence transduction models without labeled data - Improving accuracy of output label predictions

    • Benefits:**

- Enables training of models with limited labeled data - Enhances performance of sequence transduction tasks

    • Potential Commercial Applications:**

- Development of advanced language processing systems - Implementation in automated transcription services

    • Possible Prior Art:**

- Previous methods for training sequence transduction models using unsupervised learning techniques

    • Unanswered Questions:**
    • 1. How does the method handle noisy input features during training?**

- The abstract does not provide information on how the method deals with noisy input features and their impact on the training process.

    • 2. What computational resources are required to implement this training method effectively?**

- The abstract does not mention the computational resources needed to train the sequence transduction model using the described method.


Original Abstract Submitted

A method for training a sequence transduction model includes receiving a sequence of unlabeled input features extracted from unlabeled input samples. Using a teacher branch of an unsupervised subnetwork, the method includes processing the sequence of input features to predict probability distributions over possible teacher branch output labels, sampling one or more sequences of teacher branch output labels, and determining a sequence of pseudo output labels based on the one or more sequences of teacher branch output labels. Using a student branch that includes a student encoder of the unsupervised subnetwork, the method includes processing the sequence of input 10 features to predict probability distributions over possible student branch output labels, determining a negative log likelihood term based on the predicted probability distributions over possible student branch output labels and the sequence of pseudo output labels, and updating parameters of the student encoder.