Google llc (20240135923). Universal Monolingual Output Layer for Multilingual Speech Recognition simplified abstract

From WikiPatents
Revision as of 04:23, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Universal Monolingual Output Layer for Multilingual Speech Recognition

Organization Name

google llc

Inventor(s)

Chao Zhang of Mountain View CA (US)

Bo Li of Santa Clara CA (US)

Tara N. Sainath of Jersey City NJ (US)

Trevor Strohman of Mountain View CA (US)

Shuo-yiin Chang of Sunnyvale CA (US)

Universal Monolingual Output Layer for Multilingual Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240135923 titled 'Universal Monolingual Output Layer for Multilingual Speech Recognition

Simplified Explanation

The method described in the abstract involves a multilingual automated speech recognition (ASR) model that can recognize speech in multiple languages. The model receives a sequence of acoustic frames as input and generates higher order feature representations for each frame. It also predicts the language of the input and generates a probability distribution over possible speech recognition results based on the higher order feature representations.

  • Explanation of the patent:
  • Receiving a sequence of acoustic frames as input
  • Generating higher order feature representations for each frame
  • Predicting the language of the input
  • Generating a probability distribution over possible speech recognition results

Potential Applications

This technology can be applied in multilingual speech recognition systems, language translation services, voice-controlled devices, and language learning tools.

Problems Solved

1. Language identification in multilingual speech recognition 2. Efficient speech recognition across different languages

Benefits

1. Improved accuracy in recognizing speech in multiple languages 2. Enhanced user experience in multilingual applications 3. Increased efficiency in language processing tasks

Potential Commercial Applications

      1. Multilingual Speech Recognition Technology in Language Translation Services

Possible Prior Art

There are existing multilingual speech recognition systems that use language identification techniques to improve accuracy in recognizing speech in different languages.

Unanswered Questions

How does this technology handle accents and dialects in speech recognition?

The abstract does not provide information on how the multilingual ASR model deals with accents and dialects in speech recognition.

What is the computational complexity of the proposed method compared to existing multilingual ASR models?

The abstract does not mention the computational complexity of the method and how it compares to other multilingual ASR models in terms of efficiency.


Original Abstract Submitted

a method includes receiving a sequence of acoustic frames as input to a multilingual automated speech recognition (asr) model configured to recognize speech in a plurality of different supported languages and generating, by an audio encoder of the multilingual asr, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. the method also includes generating, by a language identification (lid) predictor of the multilingual asr, a language prediction representation for a corresponding higher order feature representation. the method also includes generating, by a decoder of the multilingual asr, a probability distribution over possible speech recognition results based on the corresponding higher order feature representation, a sequence of non-blank symbols, and a corresponding language prediction representation. the decoder includes monolingual output layer having a plurality of output nodes each sharing a plurality of language-specific wordpiece models.