Contrastive Siamese Network for Semi-supervised Speech Recognition

Organization Name

google llc

Inventor(s)

Jaeyoung Kim of Cupertino CA (US)

Soheil Khorram of Redwood City CA (US)

Hasim Sak of Santa Clara CA (US)

Anshuman Tripathi of Mountain View CA (US)

Han Lu of Redmond WA (US)

Qian Zhang of Mountain View CA (US)

Contrastive Siamese Network for Semi-supervised Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240242712 titled 'Contrastive Siamese Network for Semi-supervised Speech Recognition

Simplified Explanation: The patent application describes a method for processing unlabeled audio samples of spoken utterances using a contrastive siamese network to generate transcriptions.

Key Features and Innovation:

Utilizes a contrastive siamese network to process unlabeled audio samples.
Modroduces a sequence of encoder outputs for the audio samples.
Modifies time characteristics of the encoder outputs to generate target branch outputs.
Performs augmentation on the audio samples at an augmentation branch.
Generates predictions based on the target branch outputs.
Determines an unsupervised loss term for updating parameters of the audio encoder.

Potential Applications:

Speech recognition technology
Language learning tools
Audio data processing systems

Problems Solved:

Transcribing spoken utterances without paired transcriptions
Enhancing the accuracy of speech recognition systems
Improving the efficiency of language learning tools

Benefits:

Enables transcription of unlabeled audio samples
Enhances the performance of speech recognition systems
Facilitates the development of more accurate language learning tools

Commercial Applications: The technology can be applied in industries such as:

Speech recognition software development
Language learning platforms
Audio transcription services

Prior Art: Researchers can explore prior studies on:

Contrastive siamese networks in audio processing
Unsupervised learning methods for speech recognition

Frequently Updated Research: Stay informed about the latest advancements in:

Contrastive siamese networks for audio analysis
Unsupervised learning techniques in speech processing

Questions about Audio Processing Technology: 1. How does the contrastive siamese network improve transcription accuracy? 2. What are the potential limitations of using unsupervised learning in speech recognition systems?

Original Abstract Submitted

a method includes receiving a plurality of unlabeled audio samples corresponding to spoken utterances not paired with corresponding transcriptions. at a target branch of a contrastive siamese network, the method also includes generating a sequence of encoder outputs for the plurality of unlabeled audio samples and modifying time characteristics of the encoder outputs to generate a sequence of target branch outputs. at an augmentation branch of a contrastive siamese network, the method also includes performing augmentation on the unlabeled audio samples, generating a sequence of augmented encoder outputs for the augmented unlabeled audio samples, and generating predictions of the sequence of target branch outputs generated at the target branch. the method also includes determining an unsupervised loss term based on target branch outputs and predictions of the sequence of target branch outputs. the method also includes updating parameters of the audio encoder based on the unsupervised loss term.

Google llc (20240242712). Contrastive Siamese Network for Semi-supervised Speech Recognition simplified abstract

Contents

Contrastive Siamese Network for Semi-supervised Speech Recognition

Organization Name

Inventor(s)

Contrastive Siamese Network for Semi-supervised Speech Recognition - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools