Google llc (20240203406). Semi-Supervised Training Scheme For Speech Recognition simplified abstract

From WikiPatents
Jump to navigation Jump to search

Semi-Supervised Training Scheme For Speech Recognition

Organization Name

google llc

Inventor(s)

Soheil Khorram of Redwood City CA (US)

Anshuman Tripathi of Mountain View CA (US)

Kim Jaeyoung of Cupertino CA (US)

Han Lu of Redmond WA (US)

Qian Zhang of Mountain View CA (US)

Hasim Sak of Santa Clara CA (US)

Semi-Supervised Training Scheme For Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240203406 titled 'Semi-Supervised Training Scheme For Speech Recognition

Simplified Explanation:

The patent application describes a method for improving speech recognition by using unsupervised learning techniques on unlabeled audio samples.

Key Features and Innovation:

  • Receiving a sequence of acoustic frames from unlabeled audio samples
  • Generating target higher order feature representations using a supervised audio encoder
  • Augmenting the acoustic frames and generating predicted higher order feature representations using an unsupervised audio encoder
  • Determining an unsupervised loss term based on the target and predicted representations
  • Updating parameters of the speech recognition model based on the unsupervised loss term

Potential Applications: This technology can be applied in various fields such as speech recognition systems, voice-controlled devices, and audio data analysis.

Problems Solved: This technology addresses the challenge of improving speech recognition accuracy without the need for labeled training data.

Benefits: The method allows for more efficient and accurate speech recognition systems, even when transcriptions are not available for training data.

Commercial Applications: "Enhancing Speech Recognition Accuracy through Unsupervised Learning Techniques"

Prior Art: Researchers can explore prior studies on unsupervised learning in speech recognition and audio processing to understand the existing knowledge in this field.

Frequently Updated Research: Researchers are continually exploring new methods and algorithms to improve speech recognition accuracy through unsupervised learning techniques.

Questions about Speech Recognition Technology: 1. How does unsupervised learning improve speech recognition accuracy? 2. What are the potential limitations of using unsupervised learning in speech recognition systems?


Original Abstract Submitted

a method includes receiving a sequence of acoustic frames extracted from unlabeled audio samples that correspond to spoken utterances not paired with any corresponding transcriptions. the method also includes generating, using a supervised audio encoder, a target higher order feature representation for a corresponding acoustic frame. the method also includes augmenting the sequence of acoustic frames and generating, as output form an unsupervised audio encoder, a predicted higher order feature representation for a corresponding augmented acoustic frame in the sequence of augmented acoustic frames. the method also includes determining an unsupervised loss term based on the target higher order feature representation and the predicted higher order feature representation and updating parameters of the speech recognition model based on the unsupervised loss term.