Google llc (20240242712). Contrastive Siamese Network for Semi-supervised Speech Recognition simplified abstract

From WikiPatents
Jump to navigation Jump to search

Contrastive Siamese Network for Semi-supervised Speech Recognition

Organization Name

google llc

Inventor(s)

Jaeyoung Kim of Cupertino CA (US)

Soheil Khorram of Redwood City CA (US)

Hasim Sak of Santa Clara CA (US)

Anshuman Tripathi of Mountain View CA (US)

Han Lu of Redmond WA (US)

Qian Zhang of Mountain View CA (US)

Contrastive Siamese Network for Semi-supervised Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240242712 titled 'Contrastive Siamese Network for Semi-supervised Speech Recognition

Simplified Explanation: The patent application describes a method for processing unlabeled audio samples of spoken utterances using a contrastive siamese network to generate transcriptions.

Key Features and Innovation:

  • Utilizes a contrastive siamese network to process unlabeled audio samples.
  • Modroduces a sequence of encoder outputs for the audio samples.
  • Modifies time characteristics of the encoder outputs to generate target branch outputs.
  • Performs augmentation on the audio samples at an augmentation branch.
  • Generates predictions based on the target branch outputs.
  • Determines an unsupervised loss term for updating parameters of the audio encoder.

Potential Applications:

  • Speech recognition technology
  • Language learning tools
  • Audio data processing systems

Problems Solved:

  • Transcribing spoken utterances without paired transcriptions
  • Enhancing the accuracy of speech recognition systems
  • Improving the efficiency of language learning tools

Benefits:

  • Enables transcription of unlabeled audio samples
  • Enhances the performance of speech recognition systems
  • Facilitates the development of more accurate language learning tools

Commercial Applications: The technology can be applied in industries such as:

  • Speech recognition software development
  • Language learning platforms
  • Audio transcription services

Prior Art: Researchers can explore prior studies on:

  • Contrastive siamese networks in audio processing
  • Unsupervised learning methods for speech recognition

Frequently Updated Research: Stay informed about the latest advancements in:

  • Contrastive siamese networks for audio analysis
  • Unsupervised learning techniques in speech processing

Questions about Audio Processing Technology: 1. How does the contrastive siamese network improve transcription accuracy? 2. What are the potential limitations of using unsupervised learning in speech recognition systems?


Original Abstract Submitted

a method includes receiving a plurality of unlabeled audio samples corresponding to spoken utterances not paired with corresponding transcriptions. at a target branch of a contrastive siamese network, the method also includes generating a sequence of encoder outputs for the plurality of unlabeled audio samples and modifying time characteristics of the encoder outputs to generate a sequence of target branch outputs. at an augmentation branch of a contrastive siamese network, the method also includes performing augmentation on the unlabeled audio samples, generating a sequence of augmented encoder outputs for the augmented unlabeled audio samples, and generating predictions of the sequence of target branch outputs generated at the target branch. the method also includes determining an unsupervised loss term based on target branch outputs and predictions of the sequence of target branch outputs. the method also includes updating parameters of the audio encoder based on the unsupervised loss term.