20240029718. Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR

Organization Name

GOOGLE LLC

Inventor(s)

Antoine Jean Bruguier of Milpitas CA (US)

David Qiu of Fremont CA (US)

Yangzhang He of Mountain View CA (US)

Trevor Strohman of Mountain View CA (US)

Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240029718 titled 'Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR

Simplified Explanation

The abstract describes a method that uses a speech recognizer to process audio data and generate a partial transcription for an utterance. It involves generating a first lattice and partial transcription based on a first portion of the data, and then generating a second lattice and re-ranked scores for a second portion of the data based on the first lattice and the first partial transcription. Finally, a second partial transcription is generated by selecting the hypothesis with the highest re-ranked score.

  • The method uses a speech recognizer to process audio data and generate partial transcriptions.
  • It generates a first lattice and partial transcription based on a first portion of the data.
  • It generates a second lattice and re-ranked scores for a second portion of the data based on the first lattice and the first partial transcription.
  • It generates a second partial transcription by selecting the hypothesis with the highest re-ranked score.

Potential applications of this technology:

  • Speech recognition systems and software
  • Transcription services
  • Voice-controlled devices and virtual assistants

Problems solved by this technology:

  • Improves the accuracy of speech recognition by considering multiple hypotheses and re-ranking scores
  • Enables better transcription of audio data
  • Enhances the performance of voice-controlled systems

Benefits of this technology:

  • More accurate and reliable speech recognition
  • Improved transcription quality
  • Enhanced user experience with voice-controlled devices and virtual assistants


Original Abstract Submitted

a method includes processing, using a speech recognizer, a first portion of audio data to generate a first lattice, and generating a first partial transcription for an utterance based on the first lattice. the method includes processing, using the recognizer, a second portion of the data to generate, based on the first lattice, a second lattice representing a plurality of partial speech recognition hypotheses for the utterance and a plurality of corresponding speech recognition scores. for each particular partial speech recognition hypothesis, the method includes generating a corresponding re-ranked score based on the corresponding speech recognition score and whether the particular partial speech recognition hypothesis shares a prefix with the first partial transcription. the method includes generating a second partial transcription for the utterance by selecting the partial speech recognition hypothesis of the second plurality of partial speech recognition hypotheses having the highest corresponding re-ranked score.