US Patent Application 18182925. Unified Cascaded Encoder ASR model for Dynamic Model Sizes simplified abstract

From WikiPatents
Jump to navigation Jump to search

Unified Cascaded Encoder ASR model for Dynamic Model Sizes

Organization Name

Google LLC


Inventor(s)

Shaojin Ding of Mountain View CA (US)


Yangzhang He of Mountain View CA (US)


Xin Wang of Mountain View CA (US)


Weiran Wang of Palo Alto CA (US)


Trevor Strohman of Mountain View CA (US)


Tara N. Sainath of Jersey City NJ (US)


Rohit Parkash Prabhavalkar of Palo alto CA (US)


Robert David of Mountain View CA (US)


Rina Panigrahy of Mountain View CA (US)


Rami Botros of Mountain View CA (US)


Qiao Liang of Mountain View CA (US)


Ian Mcgraw of Mountain View CA (US)


Ding Zhao of Mountain View CA (US)


Dongseong Hwang of Mountain View CA (US)


Unified Cascaded Encoder ASR model for Dynamic Model Sizes - A simplified explanation of the abstract

  • This abstract for appeared for US patent application number 18182925 Titled 'Unified Cascaded Encoder ASR model for Dynamic Model Sizes'

Simplified Explanation

The abstract describes an automated speech recognition (ASR) model that consists of several components. The first encoder takes a sequence of acoustic frames as input and generates a higher order feature representation for each frame. The first decoder uses this representation to generate a probability distribution for different speech recognition hypotheses. The second encoder takes the same higher order feature representation and generates another representation for each frame. The second decoder then uses this representation to generate another probability distribution for speech recognition hypotheses.


Original Abstract Submitted

An automated speech recognition (ASR) model includes a first encoder, a first encoder, a second encoder, and a second decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The first decoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a first probability distribution over possible speech recognition hypotheses. The second encoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a second higher order feature representation for a corresponding first higher order feature frame. The second decoder receives, as input, the second higher order feature representation generated by the second encoder, and generates a second probability distribution over possible speech recognition hypotheses.