Google llc (20240347051). Small Footprint Multi-Channel Keyword Spotting simplified abstract
Contents
Small Footprint Multi-Channel Keyword Spotting
Organization Name
Inventor(s)
Jilong Wu of Mountain View CA (US)
Yiteng Huang of Mountain View CA (US)
Small Footprint Multi-Channel Keyword Spotting - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240347051 titled 'Small Footprint Multi-Channel Keyword Spotting
The abstract describes a method for detecting a hotword in a spoken utterance using a neural network and audio features from multiple channels.
- The method involves processing audio features from each channel in parallel using a neural network.
- A multi-channel audio feature representation is generated based on a concatenation of the respective audio features.
- Probability scores indicating the presence of a hotword in the audio are generated using sequentially-stacked layers.
- A wake-up process is initiated on a user device when the probability score satisfies a threshold.
Potential Applications: - Voice-activated devices - Speech recognition systems - Personal assistants
Problems Solved: - Efficient detection of hotwords in multi-channel audio - Improved wake word detection accuracy
Benefits: - Enhanced user experience with voice-controlled devices - Faster response times to user commands
Commercial Applications: Title: "Advanced Hotword Detection Technology for Voice-Activated Devices" This technology can be utilized in smart speakers, virtual assistants, and other voice-controlled devices to improve accuracy and responsiveness.
Prior Art: Prior research in neural networks for audio processing and speech recognition could be relevant to this technology.
Frequently Updated Research: Stay updated on advancements in neural network architectures for audio processing and speech recognition to enhance the performance of this hotword detection method.
Questions about Hotword Detection: 1. How does this method compare to traditional hotword detection algorithms? - This method leverages neural networks and multi-channel audio features for more accurate and efficient hotword detection. 2. What are the potential challenges in implementing this technology in real-world applications? - Challenges may include optimizing the neural network architecture for different devices and environments.
Original Abstract Submitted
a method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. for each input frame, the method includes processing, using a three-dimensional (d) single value decomposition filter (svdf) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. the method also includes generating, using sequentially-stacked svdf layers, a probability score indicating a presence of a hotword in the audio. the method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.