US Patent Application 18078460. METHOD OF TRAINING SPEECH RECOGNITION MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM simplified abstract
METHOD OF TRAINING SPEECH RECOGNITION MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM
Organization Name
BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.
Inventor(s)
Mingshuang Luo of Beijing (CN)
METHOD OF TRAINING SPEECH RECOGNITION MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM - A simplified explanation of the abstract
This abstract first appeared for US patent application 18078460 titled 'METHOD OF TRAINING SPEECH RECOGNITION MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM
Simplified Explanation
The patent application describes a method for training a speech recognition model. Here are the key points:
- The method involves inputting speech data from multiple training samples into a teacher model and a to-be-trained speech recognition model separately.
- The teacher model and the to-be-trained speech recognition model generate an embedding and encoded data, respectively.
- The embedding is subjected to multi-codebook quantization to obtain quantized codebook data.
- A loss is calculated based on the encoded data, quantized codebook data, and text data in the training sample.
- Training of the to-be-trained speech recognition model is stopped when the loss is below a preset threshold and/or the model has been trained for a preset number of times.
- The result is a trained speech recognition model that can accurately recognize speech.
Original Abstract Submitted
A method of training a speech recognition model is provided. The method includes that: speech data of each of a plurality of training samples is inputted into a teacher model and a to-be-trained speech recognition model separately. Additionally, an embedding outputted by the teacher model and encoded data outputted by the to-be-trained speech recognition model are obtained. Furthermore, quantized codebook data is obtained by performing a multi-codebook quantization on the embedding. A loss is calculated based on the encoded data, the quantized codebook data, and text data in the training sample. Moreover, a trained speech recognition model is obtained by stopping training the to-be-trained speech recognition model when the loss is less than or equal to a preset loss threshold and/or trained times is greater than preset trained times.
- BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.
- Zengwei Yao of Beijing (CN)
- Liyong Guo of Beijing (CN)
- POVEY Daniel of Beijing (CN)
- Long Lin of Beijing (CN)
- Fangjun Kuang of Beijing (CN)
- Wei Kang of Beijing (CN)
- Mingshuang Luo of Beijing (CN)
- Quandong Wang of Beijing (CN)
- Yuxiang Kong of Beijing (CN)
- G10L15/06
- G10L15/22
- G10L15/16
- G10L19/032