20240029720. Context-aware Neural Confidence Estimation for Rare Word Speech Recognition simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Context-aware Neural Confidence Estimation for Rare Word Speech Recognition

Organization Name

GOOGLE LLC

Inventor(s)

David Qiu of Fremont CA (US)

Tsendsuren Munkhdalai of Mountain View CA (US)

Yangzhang He of Mountain View CA (US)

Khe Chai Sim of Dublin CA (US)

Context-aware Neural Confidence Estimation for Rare Word Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240029720 titled 'Context-aware Neural Confidence Estimation for Rare Word Speech Recognition

Simplified Explanation

The abstract describes an automatic speech recognition (ASR) system that consists of an ASR model, a neural associative memory (NAM) biasing model, and a confidence estimation model (CEM). The ASR model encodes a sequence of audio frames representing a spoken utterance into higher-order feature representations and decodes them to produce a final speech recognition result. The NAM biasing model modifies the higher-order feature representations based on contextual information to generate biasing context vectors. The CEM computes the confidence of the final speech recognition result and is connected to the biasing context vectors generated by the NAM biasing model.

  • The ASR model encodes audio frames into higher-order feature representations and decodes them to produce speech recognition results.
  • The NAM biasing model modifies the higher-order feature representations based on contextual information to generate biasing context vectors.
  • The CEM computes the confidence of the final speech recognition result.
  • The CEM is connected to the biasing context vectors generated by the NAM biasing model.

Potential applications of this technology:

  • Speech recognition systems for various domains such as virtual assistants, transcription services, and voice-controlled devices.
  • Language translation systems that can convert spoken words into written text in real-time.
  • Accessibility tools for individuals with speech impairments or disabilities.

Problems solved by this technology:

  • Improves the accuracy and reliability of speech recognition systems by incorporating contextual information and confidence estimation.
  • Reduces errors and enhances the user experience in applications relying on speech recognition.

Benefits of this technology:

  • Enhanced speech recognition accuracy and performance.
  • Improved user experience with more reliable and context-aware speech recognition.
  • Increased efficiency in transcription services and voice-controlled devices.


Original Abstract Submitted

an automatic speech recognition (asr) system that includes an asr model, a neural associative memory (nam) biasing model, and a confidence estimation model (cem). the asr model includes an audio encoder configured to encode a sequence of audio frames characterizing a spoken utterance into a sequence of higher-order feature representations, and a decoder configured to receive the sequence of higher-order feature representations and output a final speech recognition result. the nam biasing model is configured to receive biasing contextual information and modify the sequence of higher-order feature representations based on the biasing contextual information to generate, as output, biasing context vectors. the cem is configured to compute a confidence of the final speech recognition result output by the decoder. the cem is connected to the biasing context vectors generated by the nam biasing model.