GOOGLE LLC (20240242712). Contrastive Siamese Network for Semi-supervised Speech Recognition simplified abstract

From WikiPatents
Jump to navigation Jump to search

Contrastive Siamese Network for Semi-supervised Speech Recognition

Organization Name

GOOGLE LLC

Inventor(s)

Jaeyoung Kim of Cupertino CA (US)

Soheil Khorram of Redwood City CA (US)

Hasim Sak of Santa Clara CA (US)

Anshuman Tripathi of Mountain View CA (US)

Han Lu of Redmond WA (US)

Qian Zhang of Mountain View CA (US)

Contrastive Siamese Network for Semi-supervised Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240242712 titled 'Contrastive Siamese Network for Semi-supervised Speech Recognition

Simplified Explanation: The patent application describes a method for processing unlabeled audio samples of spoken utterances using a contrastive siamese network to improve transcription accuracy.

Key Features and Innovation:

  • Utilizes a contrastive siamese network to process unlabeled audio samples.
  • Modroduces encoder outputs for the audio samples and modifies time characteristics to improve accuracy.
  • Includes an augmentation branch for performing data augmentation on the audio samples.
  • Generates predictions based on the modified encoder outputs and augmented samples.
  • Updates parameters of the audio encoder based on unsupervised loss terms.

Potential Applications: This technology can be applied in speech recognition systems, language translation tools, and voice-controlled devices.

Problems Solved: Addresses the challenge of accurately transcribing spoken utterances without paired transcriptions, improving the performance of speech processing systems.

Benefits:

  • Enhances transcription accuracy of spoken utterances.
  • Enables better performance of speech recognition systems.
  • Facilitates the development of more efficient language processing tools.

Commercial Applications: The technology can be utilized in developing advanced speech recognition software for various industries, including customer service, healthcare, and education.

Prior Art: Researchers in the field of speech processing and machine learning have explored similar methods for improving transcription accuracy using neural networks and data augmentation techniques.

Frequently Updated Research: Stay updated on advancements in contrastive siamese networks, unsupervised learning methods, and speech processing technologies to enhance the performance of transcription systems.

Questions about the Technology: 1. How does the method improve transcription accuracy without labeled data? 2. What are the potential limitations of using a contrastive siamese network for processing audio samples?


Original Abstract Submitted

a method includes receiving a plurality of unlabeled audio samples corresponding to spoken utterances not paired with corresponding transcriptions. at a target branch of a contrastive siamese network, the method also includes generating a sequence of encoder outputs for the plurality of unlabeled audio samples and modifying time characteristics of the encoder outputs to generate a sequence of target branch outputs. at an augmentation branch of a contrastive siamese network, the method also includes performing augmentation on the unlabeled audio samples, generating a sequence of augmented encoder outputs for the augmented unlabeled audio samples, and generating predictions of the sequence of target branch outputs generated at the target branch. the method also includes determining an unsupervised loss term based on target branch outputs and predictions of the sequence of target branch outputs. the method also includes updating parameters of the audio encoder based on the unsupervised loss term.