MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR

Organization Name

google llc

Inventor(s)

Ke Hu of Stony Brook NY (US)

Bo Li of Santa Clara CA (US)

Tara N. Sainath of Jersey City NJ (US)

Yu Zhang of Mountain View CA (US)

Francoise Beaufays of Mountain View CA (US)

MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240304185 titled 'MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR

The abstract of the patent application describes a method for a multilingual automatic speech recognition (ASR) model that involves processing acoustic frames of speech utterances through a series of encoders and decoders.

The method includes generating higher order feature representations using multi-head attention layers in the encoders.
A first decoder generates a probability distribution over speech recognition hypotheses based on previous symbols and the higher order feature representations.
Gating layers dynamically route outputs from previous layers to expert networks in a multi-output expert (MOE) architecture.

Potential Applications: - Multilingual speech recognition systems - Language translation applications - Voice-controlled devices and virtual assistants

Problems Solved: - Improving accuracy and efficiency of multilingual ASR models - Enhancing the performance of speech recognition in various languages

Benefits: - Enhanced accuracy in recognizing speech in multiple languages - Improved efficiency in processing and translating speech - Versatile application in different language settings

Commercial Applications: Title: Multilingual Speech Recognition Technology for Enhanced Language Processing This technology can be utilized in industries such as: - Translation services - Customer service centers - Language learning platforms

Prior Art: Prior research in the field of multilingual ASR models and MOE architectures can provide insights into similar approaches and techniques used in speech recognition technology.

Frequently Updated Research: Stay updated on advancements in multi-head attention mechanisms, MOE architectures, and language processing algorithms to enhance the performance of multilingual ASR models.

Questions about Multilingual ASR Technology: 1. How does the use of multi-head attention layers improve the feature representation in the ASR model? 2. What are the advantages of using a MOE architecture in speech recognition systems?

Original Abstract Submitted

a method of a multilingual asr model includes receiving a sequence of acoustic frames characterizing an utterance of speech. at a plurality of output steps, the method further includes generating a first higher order feature representation for an acoustic frame by a first encoder that includes a first plurality of multi-head attention layers; generating a second higher order feature representation for a corresponding first higher order feature representation by a second encoder that includes a second plurality of multi-head attention layers; and generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on the second higher order feature representation and a sequence of n previous non-blank symbols. a gating layer of each respective moe layer configured to dynamically route an output from a previous multi-head attention layer at each of the plurality of output steps to a respective pair of feed-forward expert networks.