18151540. End-to-End Streaming Keyword Spotting simplified abstract (GOOGLE LLC)
Contents
End-to-End Streaming Keyword Spotting
Organization Name
Inventor(s)
Raziel Alvarez Guevara of Menlo Park CA (US)
Hyun Jin Park of Palo Alto CA (US)
End-to-End Streaming Keyword Spotting - A simplified explanation of the abstract
This abstract first appeared for US patent application 18151540 titled 'End-to-End Streaming Keyword Spotting
Simplified Explanation
The abstract describes a method for detecting a specific word or phrase (hotword) in streaming audio captured by a user device. The method uses a memorized neural network with sequentially-stacked single value decomposition filter (SVDF) layers. Each SVDF layer contains neurons with memory components and two stages of filtering. The method involves generating a probability score for the presence of the hotword, determining if it meets a detection threshold, and initiating a wake-up process on the user device for further processing.
- The method uses a neural network with SVDF layers to detect a hotword in streaming audio.
- Each neuron in the network has a memory component and two stages of filtering.
- The method generates a probability score to indicate the presence of the hotword.
- A wake-up process is initiated on the user device if the probability score meets a detection threshold.
Potential Applications
- Voice assistants: This technology can be used in voice-activated devices like smart speakers or smartphones to detect hotwords like "Hey Siri" or "Alexa."
- Speech recognition systems: The method can be applied in speech recognition software to trigger specific actions or commands based on hotword detection.
- Security systems: This technology can be used in security systems to detect specific hotwords that may indicate a threat or unauthorized access.
Problems Solved
- Accurate hotword detection: The method improves the accuracy of detecting specific hotwords in streaming audio, reducing false positives and false negatives.
- Efficient processing: The use of SVDF layers and memory components allows for efficient processing of audio features, enabling real-time hotword detection.
- User device wake-up: The method initiates a wake-up process on the user device only when the probability score satisfies the detection threshold, conserving device resources.
Benefits
- Improved user experience: Accurate hotword detection ensures that voice commands or actions are triggered reliably, enhancing the user experience with voice-activated devices.
- Enhanced security: The technology can contribute to improved security systems by detecting specific hotwords that may indicate a threat or unauthorized access.
- Efficient resource utilization: The method optimizes the processing of audio features, reducing computational load and conserving device resources.
Original Abstract Submitted
A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.