Google LLC (20250006217). Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
Organization Name
Inventor(s)
Christopher Li of New York NY US
Kyle Scott Kastner of Waltham MA US
Zhehuai Chen of Edgewater NJ US
Andrew Maxwell Rosenberg of Brooklyn NY US
Leonid Aleksandrovich Velikovich of New York NY US
Patrick Maxim Rondon of New York NY US
Diamantino Antonio Caseiro of Philadelphia PA US
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
This abstract first appeared for US patent application 20250006217 titled 'Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
Original Abstract Submitted
a method includes receiving training data that includes a set of transcribed speech utterances where each respective transcribed speech utterance is paired with a corresponding transcription. for each respective transcribed speech utterance, the method includes generating an encoded audio representation and an encoded textual representation, generating a higher order audio feature representation for a corresponding encoded audio representation, generating a higher order textual feature representation for a corresponding encoded textual representation, and determining a loss for the respective transcribed speech utterance based on the higher order audio feature representation and the higher order textual feature representation. the method also includes training a speech encoder and a text encoder of a correction model based on the loss determined for each transcribed speech utterance of the set of transcribed speech utterances.
- Google LLC
- Christopher Li of New York NY US
- Kyle Scott Kastner of Waltham MA US
- Yuan Wang of Hoboken NJ US
- Zhehuai Chen of Edgewater NJ US
- Andrew Maxwell Rosenberg of Brooklyn NY US
- Heng Su of Beijing CN
- Qian Chen of Beijing CN
- Leonid Aleksandrovich Velikovich of New York NY US
- Patrick Maxim Rondon of New York NY US
- Diamantino Antonio Caseiro of Philadelphia PA US
- Zelin Wu of Jersey City NJ US
- G10L25/30
- G10L15/26
- CPC G10L25/30
(Ad) Transform your business with AI in minutes, not months
Trusted by 1,000+ companies worldwide