US Patent Application 18336211. EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL simplified abstract

From WikiPatents
Jump to navigation Jump to search

EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL

Organization Name

Google LLC


Inventor(s)

Tara Sainath of Jersey City NJ (US)


Arun Narayanan of Milpitas CA (US)


Rami Botros of Mountain View CA (US)


Yanzhang He of Mountain View CA (US)


Ehsan Variani of Mountain View CA (US)


Cyril Allauzen of Mountain View CA (US)


David Rybach of Aachen (DE)


Ruoming Pang of New York NY (US)


Trevor Strohman of Mountain View CA (US)


EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL - A simplified explanation of the abstract

  • This abstract for appeared for US patent application number 18336211 Titled 'EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL'

Simplified Explanation

The abstract describes an Automatic Speech Recognition (ASR) model that consists of three main components: a first encoder, a second encoder, and a decoder.

The first encoder takes in a sequence of acoustic frames (representing audio input) and generates a higher order feature representation for each frame.

The second encoder then takes the higher order feature representation generated by the first encoder and generates a second higher order feature representation for each frame.

The decoder receives the second higher order feature representation and generates a probability distribution over possible speech recognition hypotheses.

Finally, a language model takes the probability distribution and generates a rescored probability distribution.

In simpler terms, the ASR model processes audio input, extracts important features, generates probabilities for different speech recognition options, and further refines those probabilities using a language model.


Original Abstract Submitted

An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.