20240029718. Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR simplified abstract (GOOGLE LLC)
Contents
Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR
Organization Name
Inventor(s)
Antoine Jean Bruguier of Milpitas CA (US)
Yangzhang He of Mountain View CA (US)
Trevor Strohman of Mountain View CA (US)
Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240029718 titled 'Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR
Simplified Explanation
The abstract describes a method that uses a speech recognizer to process audio data and generate a partial transcription for an utterance. It involves generating a first lattice and partial transcription based on a first portion of the data, and then generating a second lattice and re-ranked scores for a second portion of the data based on the first lattice and the first partial transcription. Finally, a second partial transcription is generated by selecting the hypothesis with the highest re-ranked score.
- The method uses a speech recognizer to process audio data and generate partial transcriptions.
- It generates a first lattice and partial transcription based on a first portion of the data.
- It generates a second lattice and re-ranked scores for a second portion of the data based on the first lattice and the first partial transcription.
- It generates a second partial transcription by selecting the hypothesis with the highest re-ranked score.
Potential applications of this technology:
- Speech recognition systems and software
- Transcription services
- Voice-controlled devices and virtual assistants
Problems solved by this technology:
- Improves the accuracy of speech recognition by considering multiple hypotheses and re-ranking scores
- Enables better transcription of audio data
- Enhances the performance of voice-controlled systems
Benefits of this technology:
- More accurate and reliable speech recognition
- Improved transcription quality
- Enhanced user experience with voice-controlled devices and virtual assistants
Original Abstract Submitted
a method includes processing, using a speech recognizer, a first portion of audio data to generate a first lattice, and generating a first partial transcription for an utterance based on the first lattice. the method includes processing, using the recognizer, a second portion of the data to generate, based on the first lattice, a second lattice representing a plurality of partial speech recognition hypotheses for the utterance and a plurality of corresponding speech recognition scores. for each particular partial speech recognition hypothesis, the method includes generating a corresponding re-ranked score based on the corresponding speech recognition score and whether the particular partial speech recognition hypothesis shares a prefix with the first partial transcription. the method includes generating a second partial transcription for the utterance by selecting the partial speech recognition hypothesis of the second plurality of partial speech recognition hypotheses having the highest corresponding re-ranked score.