Google LLC (20250006217). Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
Contents
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
Organization Name
Inventor(s)
Christopher Li of New York NY (US)
Kyle Scott Kastner of Waltham MA (US)
Zhehuai Chen of Edgewater NJ (US)
Andrew Maxwell Rosenberg of Brooklyn NY (US)
Leonid Aleksandrovich Velikovich of New York NY (US)
Patrick Maxim Rondon of New York NY (US)
Diamantino Antonio Caseiro of Philadelphia PA (US)
Zelin Wu of Jersey City NJ (US)
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
This abstract first appeared for US patent application 20250006217 titled 'Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
Original Abstract Submitted
a method includes receiving training data that includes a set of transcribed speech utterances where each respective transcribed speech utterance is paired with a corresponding transcription. for each respective transcribed speech utterance, the method includes generating an encoded audio representation and an encoded textual representation, generating a higher order audio feature representation for a corresponding encoded audio representation, generating a higher order textual feature representation for a corresponding encoded textual representation, and determining a loss for the respective transcribed speech utterance based on the higher order audio feature representation and the higher order textual feature representation. the method also includes training a speech encoder and a text encoder of a correction model based on the loss determined for each transcribed speech utterance of the set of transcribed speech utterances.
- Google LLC
- Christopher Li of New York NY (US)
- Kyle Scott Kastner of Waltham MA (US)
- Yuan Wang of Hoboken NJ (US)
- Zhehuai Chen of Edgewater NJ (US)
- Andrew Maxwell Rosenberg of Brooklyn NY (US)
- Heng Su of Beijing (CN)
- Qian Chen of Beijing (CN)
- Leonid Aleksandrovich Velikovich of New York NY (US)
- Patrick Maxim Rondon of New York NY (US)
- Diamantino Antonio Caseiro of Philadelphia PA (US)
- Zelin Wu of Jersey City NJ (US)
- G10L25/30
- G10L15/26
- CPC G10L25/30