Knowledge Distillation with Domain Mismatch For Speech Recognition

Organization Name

google llc

Inventor(s)

Tien-Ju Yang of Mountain View CA (US)

You-Chi Cheng of Mountain View CA (US)

Shankar Kumar of New York NY (US)

Jared Lichtarge of Mountain View CA (US)

Ehsan Amid of Mountain View CA (US)

Yuxin Ding of Mountain View CA (US)

Rajiv Mathews of Sunnyvale CA (US)

Mingqing Chen of Saratoga CA (US)

Knowledge Distillation with Domain Mismatch For Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240233707 titled 'Knowledge Distillation with Domain Mismatch For Speech Recognition

Simplified Explanation:

The method described in the patent application involves distilling a student automatic speech recognition (ASR) model from a teacher ASR model by using out-of-domain training utterances and pseudo-labels generated by the teacher ASR model.

The method receives distillation data with out-of-domain training utterances.
For each out-of-domain training utterance, an augmented version is generated.
A teacher ASR model trained on target domain data generates pseudo-labels for the augmented utterances.
The student ASR model is distilled from the teacher model using the augmented utterances and corresponding pseudo-labels.

Key Features and Innovation:

Generation of augmented out-of-domain training utterances.
Utilization of a teacher ASR model to generate pseudo-labels.
Distillation of a student ASR model from the teacher model.

Potential Applications:

Improving the accuracy and performance of ASR models.
Enhancing the training process for ASR technology.
Adapting ASR models to different domains efficiently.

Problems Solved:

Addressing the challenge of training ASR models with out-of-domain data.
Improving the transferability of ASR models across different domains.
Enhancing the scalability of ASR training processes.

Benefits:

Increased accuracy and robustness of ASR models.
Efficient adaptation of ASR technology to new domains.
Streamlined training processes for ASR models.

Commercial Applications:

The technology can be applied in industries such as:

Speech recognition software development.
Customer service automation.
Voice-controlled devices and applications.

Questions about ASR Distillation: 1. How does the use of augmented out-of-domain training utterances improve the training process for ASR models? 2. What are the advantages of distilling a student ASR model from a teacher model in terms of performance and efficiency?

Original Abstract Submitted

a method includes receiving distillation data including a plurality of out-of-domain training utterances. for each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher asr model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. the method also includes distilling a student asr model from the teacher asr model by training the student asr model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher asr model.

Google llc (20240233707). Knowledge Distillation with Domain Mismatch For Speech Recognition simplified abstract

Knowledge Distillation with Domain Mismatch For Speech Recognition

Organization Name

Inventor(s)

Knowledge Distillation with Domain Mismatch For Speech Recognition - A simplified explanation of the abstract

Original Abstract Submitted

(Ad) Transform your business with AI in minutes, not months

Transform your business with AI in minutes, not months