18418246. MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION simplified abstract (Google LLC)

From WikiPatents
Revision as of 08:06, 24 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION

Organization Name

Google LLC

Inventor(s)

Zhifeng Chen of Sunnyvale CA (US)

Bo Li of Santa Clara CA (US)

Eugene Weinstein of New York NY (US)

Yonghui Wu of Fremont CA (US)

Pedro J. Moreno Mengibar of Jersey City NJ (US)

Ron J. Weiss of New York NY (US)

Khe Chai Sim of Dublin CA (US)

Tara N. Sainath of Jersey City NJ (US)

Patrick An Phu Nguyen of Palo Alto CA (US)

MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18418246 titled 'MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION

Simplified Explanation

The patent application describes methods, systems, and apparatus for speech recognition using multi-dialect and multilingual models. The technology involves receiving audio data of an utterance, determining input features based on the audio data, providing these features to a speech recognition model trained to output scores for linguistic units in multiple languages or dialects, and generating a transcription of the utterance based on the model's output.

  • Speech recognition technology using multi-dialect and multilingual models
  • Input features determined from audio data provided to a trained speech recognition model
  • Model outputs scores indicating likelihood of linguistic units in different languages or dialects
  • Transcription of utterance generated based on model output

Potential Applications

This technology can be applied in various fields such as language learning apps, virtual assistants, call center automation, and transcription services.

Problems Solved

This technology addresses the challenge of accurately recognizing speech in different languages and dialects, improving communication and accessibility for users worldwide.

Benefits

The benefits of this technology include improved accuracy in speech recognition, enhanced user experience, increased efficiency in transcribing audio data, and better accessibility for multilingual users.

Potential Commercial Applications

A potential commercial application of this technology could be in the development of multilingual virtual assistants for businesses operating in diverse linguistic environments.

Possible Prior Art

One possible prior art in this field is the use of language models in speech recognition systems to improve accuracy and performance.

Unanswered Questions

How does this technology handle accents and regional variations in speech recognition?

The article does not provide specific details on how the technology addresses accents and regional variations in speech recognition.

What are the limitations of using cluster adaptive training in speech recognition models?

The article does not discuss any potential limitations or challenges associated with using cluster adaptive training in speech recognition models.


Original Abstract Submitted

Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.