20240013777. Unsupervised Data Selection via Discrete Speech Representation for Automatic Speech Recognition simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

Unsupervised Data Selection via Discrete Speech Representation for Automatic Speech Recognition

Organization Name

Google LLC

Inventor(s)

Zhiyun Lu of Brooklyn NY (US)

Yu Zhang of Mountain View CA (US)

Wei Han of Redwood City CA (US)

Yongqiang Wang of Kirkland WA (US)

Parisa Haghani of Mountain View CA (US)

Zhehuai Chen of Edgewater NJ (US)

Unsupervised Data Selection via Discrete Speech Representation for Automatic Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240013777 titled 'Unsupervised Data Selection via Discrete Speech Representation for Automatic Speech Recognition

Simplified Explanation

The abstract describes a method for training an automatic speech recognition (ASR) model using a corpus of unlabeled training data. The method involves obtaining a corpus of spoken utterances, each including audio data. A target domain is received, and a subset of utterances from the corpus that correspond to the target domain is selected using a contrastive data selection model. The ASR model is then trained on this subset of utterances.

  • The method involves obtaining a corpus of unlabeled training data consisting of spoken utterances with audio data.
  • A target domain is received.
  • A contrastive data selection model is used to select a subset of utterances from the corpus that correspond to the target domain.
  • An ASR model is trained on the selected subset of utterances.

Potential applications of this technology:

  • Improving automatic speech recognition systems.
  • Enhancing speech-to-text transcription services.
  • Enabling voice-controlled devices and virtual assistants to better understand user commands.

Problems solved by this technology:

  • Lack of labeled training data for specific target domains.
  • Difficulty in training ASR models for specific domains due to limited data availability.

Benefits of this technology:

  • More accurate and reliable speech recognition in specific domains.
  • Improved user experience with voice-controlled devices and virtual assistants.
  • Increased efficiency and productivity in speech-to-text transcription services.


Original Abstract Submitted

a method includes obtaining a corpus of unlabeled training data including a plurality of spoken utterances, each corresponding spoken utterance of the plurality of spoken utterances includes audio data characterizing the corresponding spoken utterance. the method also includes receiving a target domain. the method also includes selecting, using a contrastive data selection model, a subset of the utterances from the corpus of unlabeled training data that correspond to the target domain. the method includes training an automatic speech recognition (asr) model on the subset of utterances.