Google llc (20240161732). MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION simplified abstract

From WikiPatents
Jump to navigation Jump to search

MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION

Organization Name

google llc

Inventor(s)

Zhifeng Chen of Sunnyvale CA (US)

Bo Li of Santa Clara CA (US)

Eugene Weinstein of New York NY (US)

Yonghui Wu of Fremont CA (US)

Pedro J. Moreno Mengibar of Jersey City NJ (US)

Ron J. Weiss of New York NY (US)

Khe Chai Sim of Dublin CA (US)

Tara N. Sainath of Jersey City NJ (US)

Patrick An Phu Nguyen of Palo Alto CA (US)

MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240161732 titled 'MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION

Simplified Explanation

The patent application describes methods, systems, and apparatus for speech recognition using multi-dialect and multilingual models. In simple terms, the technology involves recognizing speech in different languages and dialects using a trained model.

  • Audio data of an utterance is received.
  • Input features based on the audio data are provided to a speech recognition model trained to recognize multiple languages or dialects.
  • The speech recognition model outputs scores indicating the likelihood of linguistic units for each language or dialect.
  • The model may have been trained using cluster adaptive training.
  • A transcription of the utterance is generated based on the output of the speech recognition model.

Potential Applications

This technology can be applied in various fields such as language translation, voice-controlled devices, and transcription services.

Problems Solved

1. Overcoming language barriers in speech recognition. 2. Improving accuracy and efficiency in transcribing multilingual content.

Benefits

1. Enhanced communication across different languages. 2. Increased accessibility for non-native speakers. 3. Improved transcription accuracy for multilingual content.

Potential Commercial Applications

Optimizing customer service chatbots for multilingual support.

Possible Prior Art

One potential prior art could be the use of language models in speech recognition systems to improve accuracy and performance.

Unanswered Questions

How does the technology handle accents and regional dialects?

The patent application does not specifically address how the technology deals with accents and regional variations in speech.

What is the computational complexity of the multi-dialect and multilingual models?

The patent application does not provide information on the computational resources required to implement the technology.


Original Abstract Submitted

methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. in some implementations, audio data indicating audio characteristics of an utterance is received. input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. the speech recognition model can be one that has been trained using cluster adaptive training. output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. a transcription of the utterance generated based on the output of the speech recognition model is provided.