18418246. MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION simplified abstract (Google LLC)
Contents
- 1 MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 Original Abstract Submitted
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION
Organization Name
Inventor(s)
Zhifeng Chen of Sunnyvale CA (US)
Eugene Weinstein of New York NY (US)
Pedro J. Moreno Mengibar of Jersey City NJ (US)
Ron J. Weiss of New York NY (US)
Khe Chai Sim of Dublin CA (US)
Tara N. Sainath of Jersey City NJ (US)
Patrick An Phu Nguyen of Palo Alto CA (US)
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION - A simplified explanation of the abstract
This abstract first appeared for US patent application 18418246 titled 'MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION
Simplified Explanation
The patent application describes methods, systems, and apparatus for speech recognition using multi-dialect and multilingual models. The technology involves receiving audio data of an utterance, determining input features based on the audio data, providing these features to a speech recognition model trained to output scores for linguistic units in multiple languages or dialects, and generating a transcription of the utterance based on the model's output.
- Speech recognition technology using multi-dialect and multilingual models
- Input features determined from audio data provided to a trained speech recognition model
- Model outputs scores indicating likelihood of linguistic units in different languages or dialects
- Transcription of utterance generated based on model output
Potential Applications
This technology can be applied in various fields such as language learning apps, virtual assistants, call center automation, and transcription services.
Problems Solved
This technology addresses the challenge of accurately recognizing speech in different languages and dialects, improving communication and accessibility for users worldwide.
Benefits
The benefits of this technology include improved accuracy in speech recognition, enhanced user experience, increased efficiency in transcribing audio data, and better accessibility for multilingual users.
Potential Commercial Applications
A potential commercial application of this technology could be in the development of multilingual virtual assistants for businesses operating in diverse linguistic environments.
Possible Prior Art
One possible prior art in this field is the use of language models in speech recognition systems to improve accuracy and performance.
Unanswered Questions
How does this technology handle accents and regional variations in speech recognition?
The article does not provide specific details on how the technology addresses accents and regional variations in speech recognition.
What are the limitations of using cluster adaptive training in speech recognition models?
The article does not discuss any potential limitations or challenges associated with using cluster adaptive training in speech recognition models.
Original Abstract Submitted
Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
- Google LLC
- Zhifeng Chen of Sunnyvale CA (US)
- Bo Li of Santa Clara CA (US)
- Eugene Weinstein of New York NY (US)
- Yonghui Wu of Fremont CA (US)
- Pedro J. Moreno Mengibar of Jersey City NJ (US)
- Ron J. Weiss of New York NY (US)
- Khe Chai Sim of Dublin CA (US)
- Tara N. Sainath of Jersey City NJ (US)
- Patrick An Phu Nguyen of Palo Alto CA (US)
- G10L15/00
- G10L15/07
- G10L15/16