GOOGLE LLC (20240347051). Small Footprint Multi-Channel Keyword Spotting simplified abstract
Contents
Small Footprint Multi-Channel Keyword Spotting
Organization Name
Inventor(s)
Jilong Wu of Mountain View CA (US)
Yiteng Huang of Mountain View CA (US)
Small Footprint Multi-Channel Keyword Spotting - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240347051 titled 'Small Footprint Multi-Channel Keyword Spotting
The abstract describes a method for detecting a hotword in a spoken utterance by processing multi-channel audio using a neural network and generating a probability score for the presence of the hotword.
- Utilizes a three-dimensional single value decomposition filter input layer to process audio features from multiple channels in parallel.
- Generates a multi-channel audio feature representation by concatenating the respective audio features.
- Uses sequentially-stacked single value decomposition filter layers to generate a probability score for the hotword.
- Initiates a wake-up process on a user device when the probability score satisfies a threshold.
Potential Applications: - Voice-activated devices - Speech recognition systems - Hands-free control interfaces
Problems Solved: - Efficient detection of specific keywords in spoken audio - Improved accuracy in identifying hotwords in noisy environments
Benefits: - Enhanced user experience with voice-controlled devices - Increased reliability in speech recognition technology
Commercial Applications: Title: "Advanced Hotword Detection Technology for Voice-Activated Devices" This technology can be used in smart speakers, virtual assistants, and other voice-controlled devices to improve the accuracy and efficiency of hotword detection, enhancing user interaction and overall performance.
Questions about Hotword Detection Technology: 1. How does this method improve the accuracy of hotword detection in comparison to traditional methods? 2. What are the potential challenges in implementing this technology in real-world applications?
Original Abstract Submitted
a method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. for each input frame, the method includes processing, using a three-dimensional (d) single value decomposition filter (svdf) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. the method also includes generating, using sequentially-stacked svdf layers, a probability score indicating a presence of a hotword in the audio. the method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.