18619684. Contrastive Siamese Network for Semi-supervised Speech Recognition simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Contrastive Siamese Network for Semi-supervised Speech Recognition

Organization Name

GOOGLE LLC

Inventor(s)

Jaeyoung Kim of Cupertino CA (US)

Soheil Khorram of Redwood City CA (US)

Hasim Sak of Santa Clara CA (US)

Anshuman Tripathi of Mountain View CA (US)

Han Lu of Redmond WA (US)

Qian Zhang of Mountain View CA (US)

Contrastive Siamese Network for Semi-supervised Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 18619684 titled 'Contrastive Siamese Network for Semi-supervised Speech Recognition

Simplified Explanation: The method described in the patent application involves processing unlabeled audio samples of spoken utterances using a contrastive Siamese network to generate predictions and update parameters of an audio encoder.

Key Features and Innovation:

  • Utilizes a contrastive Siamese network to process unlabeled audio samples.
  • Modifies time characteristics of encoder outputs to generate target branch outputs.
  • Performs augmentation on audio samples to improve predictions.
  • Determines unsupervised loss term based on target branch outputs and predictions.
  • Updates parameters of the audio encoder based on the unsupervised loss term.

Potential Applications: This technology could be applied in speech recognition systems, language learning tools, and audio data analysis platforms.

Problems Solved: Addresses the challenge of processing unlabeled audio samples without corresponding transcriptions in a supervised manner.

Benefits: Enhances the accuracy and efficiency of processing spoken utterances without the need for manual transcription.

Commercial Applications: "Unlabeled Audio Samples Processing Method Using Siamese Network" could be utilized in developing advanced speech recognition software for various industries, including customer service, healthcare, and education.

Prior Art: Researchers in the field of machine learning and audio processing have explored similar techniques for processing unlabeled audio data using neural networks.

Frequently Updated Research: Stay informed about advancements in contrastive Siamese networks and unsupervised learning methods for audio data processing.

Questions about Unlabeled Audio Samples Processing Method Using Siamese Network: 1. How does this method improve the processing of unlabeled audio samples compared to traditional techniques? 2. What are the potential limitations of using a contrastive Siamese network for audio data processing?


Original Abstract Submitted

A method includes receiving a plurality of unlabeled audio samples corresponding to spoken utterances not paired with corresponding transcriptions. At a target branch of a contrastive Siamese network, the method also includes generating a sequence of encoder outputs for the plurality of unlabeled audio samples and modifying time characteristics of the encoder outputs to generate a sequence of target branch outputs. At an augmentation branch of a contrastive Siamese network, the method also includes performing augmentation on the unlabeled audio samples, generating a sequence of augmented encoder outputs for the augmented unlabeled audio samples, and generating predictions of the sequence of target branch outputs generated at the target branch. The method also includes determining an unsupervised loss term based on target branch outputs and predictions of the sequence of target branch outputs. The method also includes updating parameters of the audio encoder based on the unsupervised loss term.