Google LLC (20240331700). On-Device Multilingual Speech Recognition simplified abstract

From WikiPatents
Jump to navigation Jump to search

On-Device Multilingual Speech Recognition

Organization Name

Google LLC

Inventor(s)

Yang Yu of Millburn NJ (US)

Quan Wang of Hoboken NJ (US)

Ignacio Lopez Moreno of Brooklyn NY (US)

On-Device Multilingual Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240331700 titled 'On-Device Multilingual Speech Recognition

  • Simplified Explanation:

The patent application describes a method for language identification and speech recognition that involves predicting languages, switching between language packs, and generating transcriptions based on the identified languages.

  • Key Features and Innovation:

- Receiving and processing input audio frames to determine predicted languages - Switching between language packs based on language switches in the input audio - Rewinding audio data to generate accurate transcriptions using the appropriate language pack

  • Potential Applications:

- Multilingual speech recognition systems - Language switching in real-time transcription services - Language prediction for improved accuracy in speech recognition

  • Problems Solved:

- Efficient language identification in multilingual environments - Seamless language switching for accurate transcriptions - Enhanced speech recognition performance through language prediction

  • Benefits:

- Improved accuracy in transcribing multilingual audio - Seamless language switching for better user experience - Enhanced performance of speech recognition systems

  • Commercial Applications:

Title: Multilingual Speech Recognition System Description: This technology can be utilized in transcription services, language learning apps, and multilingual customer service platforms to enhance communication and user experience.

  • Questions about the Technology:

1. How does the method determine language switches in the input audio? 2. What are the potential challenges in implementing this technology in real-time applications?

  • Frequently Updated Research:

There may be ongoing research in the field of multilingual speech recognition and language prediction algorithms that could further enhance the capabilities of this technology.


Original Abstract Submitted

a method includes receiving a sequence of input audio frames and processing each corresponding input audio frame to determine a language id event that indicates a predicted language. the method also includes obtaining speech recognition events each including a respective speech recognition result determined by a first language pack. based on determining that the utterance includes a language switch from the first language to a second language, the method also includes loading a second language pack onto the client device and rewinding the input audio data buffered by an audio buffer to a time of the corresponding input audio frame associated with the language id event that first indicated the second language as the predicted language. the method also includes emitting a first transcription and processing, using the second language pack loaded onto the client device, the rewound buffered audio data to generate a second transcription.