Google llc (20240347051). Small Footprint Multi-Channel Keyword Spotting simplified abstract

From WikiPatents
Jump to navigation Jump to search

Small Footprint Multi-Channel Keyword Spotting

Organization Name

google llc

Inventor(s)

Jilong Wu of Mountain View CA (US)

Yiteng Huang of Mountain View CA (US)

Small Footprint Multi-Channel Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240347051 titled 'Small Footprint Multi-Channel Keyword Spotting

The abstract describes a method for detecting a hotword in a spoken utterance using a neural network and audio features from multiple channels.

  • The method involves processing audio features from each channel in parallel using a neural network.
  • A multi-channel audio feature representation is generated based on a concatenation of the respective audio features.
  • Probability scores indicating the presence of a hotword in the audio are generated using sequentially-stacked layers.
  • A wake-up process is initiated on a user device when the probability score satisfies a threshold.

Potential Applications: - Voice-activated devices - Speech recognition systems - Personal assistants

Problems Solved: - Efficient detection of hotwords in multi-channel audio - Improved wake word detection accuracy

Benefits: - Enhanced user experience with voice-controlled devices - Faster response times to user commands

Commercial Applications: Title: "Advanced Hotword Detection Technology for Voice-Activated Devices" This technology can be utilized in smart speakers, virtual assistants, and other voice-controlled devices to improve accuracy and responsiveness.

Prior Art: Prior research in neural networks for audio processing and speech recognition could be relevant to this technology.

Frequently Updated Research: Stay updated on advancements in neural network architectures for audio processing and speech recognition to enhance the performance of this hotword detection method.

Questions about Hotword Detection: 1. How does this method compare to traditional hotword detection algorithms? - This method leverages neural networks and multi-channel audio features for more accurate and efficient hotword detection. 2. What are the potential challenges in implementing this technology in real-world applications? - Challenges may include optimizing the neural network architecture for different devices and environments.


Original Abstract Submitted

a method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. for each input frame, the method includes processing, using a three-dimensional (d) single value decomposition filter (svdf) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. the method also includes generating, using sequentially-stacked svdf layers, a probability score indicating a presence of a hotword in the audio. the method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.