TWO-PASS END TO END SPEECH RECOGNITION

Organization Name

GOOGLE LLC

Inventor(s)

Tara N. Sainath of Jersey City NJ (US)

Yanzhang He of Palo Alto CA (US)

Bo Li of Fremont CA (US)

Arun Narayanan of Milpitas CA (US)

Ruoming Pang of New York NY (US)

Antoine Jean Bruguier of Milpitas CA (US)

Shuo-yiin Chang of Mountain View CA (US)

Wei Li of Fremont CA (US)

TWO-PASS END TO END SPEECH RECOGNITION

This abstract first appeared for US patent application 18815537 titled 'TWO-PASS END TO END SPEECH RECOGNITION

Original Abstract Submitted

Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.