GOOGLE LLC (20240347051). Small Footprint Multi-Channel Keyword Spotting simplified abstract

From WikiPatents
Jump to navigation Jump to search

Small Footprint Multi-Channel Keyword Spotting

Organization Name

GOOGLE LLC

Inventor(s)

Jilong Wu of Mountain View CA (US)

Yiteng Huang of Mountain View CA (US)

Small Footprint Multi-Channel Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240347051 titled 'Small Footprint Multi-Channel Keyword Spotting

The abstract describes a method for detecting a hotword in a spoken utterance by processing multi-channel audio using a neural network and generating a probability score for the presence of the hotword.

  • Utilizes a three-dimensional single value decomposition filter input layer to process audio features from multiple channels in parallel.
  • Generates a multi-channel audio feature representation by concatenating the respective audio features.
  • Uses sequentially-stacked single value decomposition filter layers to generate a probability score for the hotword.
  • Initiates a wake-up process on a user device when the probability score satisfies a threshold.

Potential Applications: - Voice-activated devices - Speech recognition systems - Hands-free control interfaces

Problems Solved: - Efficient detection of specific keywords in spoken audio - Improved accuracy in identifying hotwords in noisy environments

Benefits: - Enhanced user experience with voice-controlled devices - Increased reliability in speech recognition technology

Commercial Applications: Title: "Advanced Hotword Detection Technology for Voice-Activated Devices" This technology can be used in smart speakers, virtual assistants, and other voice-controlled devices to improve the accuracy and efficiency of hotword detection, enhancing user interaction and overall performance.

Questions about Hotword Detection Technology: 1. How does this method improve the accuracy of hotword detection in comparison to traditional methods? 2. What are the potential challenges in implementing this technology in real-world applications?


Original Abstract Submitted

a method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. for each input frame, the method includes processing, using a three-dimensional (d) single value decomposition filter (svdf) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. the method also includes generating, using sequentially-stacked svdf layers, a probability score indicating a presence of a hotword in the audio. the method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.