QUALCOMM Incorporated (20240274127). LATENCY REDUCTION FOR MULTI-STAGE SPEECH RECOGNITION simplified abstract

From WikiPatents
Jump to navigation Jump to search

LATENCY REDUCTION FOR MULTI-STAGE SPEECH RECOGNITION

Organization Name

QUALCOMM Incorporated

Inventor(s)

Uday Reddy Thummaluri of Nalgonda (IN)

Prapulla Vuppu of Secunderabad (IN)

Sachin Raghunath Abdagire of San Diego CA (US)

Ritesh Garg of Hyderabad (IN)

LATENCY REDUCTION FOR MULTI-STAGE SPEECH RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240274127 titled 'LATENCY REDUCTION FOR MULTI-STAGE SPEECH RECOGNITION

Simplified Explanation: The patent application describes a system for processing audio samples using keyword detection models to skip processing unnecessary frames.

  • The system receives audio samples and determines keyword detection scores using models.
  • If the score exceeds a threshold, the model processes subsequent frames.
  • Frames with high scores are compared to a second threshold for further processing.
  • A second model is used to skip processing frames based on the second threshold.

Key Features and Innovation:

  • Utilizes keyword detection models to process audio samples efficiently.
  • Skips processing frames with low keyword detection scores.
  • Improves processing speed and accuracy of audio sample analysis.

Potential Applications:

  • Speech recognition systems
  • Voice-controlled devices
  • Audio transcription software

Problems Solved:

  • Reducing processing time for analyzing audio samples
  • Enhancing the accuracy of keyword detection in audio data

Benefits:

  • Faster and more efficient audio sample processing
  • Improved performance of keyword detection models
  • Enhanced user experience in speech recognition applications

Commercial Applications: The technology can be applied in various industries such as telecommunications, smart home devices, and transcription services to improve the efficiency and accuracy of audio processing systems.

Questions about Audio Sample Processing: 1. How does the system determine which frames to skip processing? 2. What are the potential limitations of using keyword detection models in audio sample analysis?


Original Abstract Submitted

systems and techniques are provided for processing one or more audio samples. for example, a process can include receiving one or more audio samples in a first audio frame and determining, using a first keyword detection model, a first keyword detection score for the first audio frame. one or more audio samples can be received in additional audio frames. based on the first keyword detection score exceeding a first threshold, the first keyword detection model can be used to determine a keyword detection score for each audio frame of the additional audio frames. the respective keyword detection score for each audio frame of the additional audio frames can be compared to a second threshold that is greater than the first threshold. based on the respective keyword detection score exceeding the second threshold, using a second keyword detection model to process the first audio frame and the additional audio frames can be skipped.