18754462. Small Footprint Multi-Channel Keyword Spotting simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Small Footprint Multi-Channel Keyword Spotting

Organization Name

GOOGLE LLC

Inventor(s)

Jilong Wu of Mountain View CA (US)

Yiteng Huang of Mountain View CA (US)

Small Footprint Multi-Channel Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 18754462 titled 'Small Footprint Multi-Channel Keyword Spotting

Simplified Explanation

The patent application describes a method to detect a specific word in spoken audio by analyzing multi-channel audio data using a neural network.

  • Utilizes a three-dimensional single value decomposition filter to process audio features from multiple channels simultaneously.
  • Generates a probability score indicating the presence of the specific word in the audio.
  • Initiates a wake-up process on a user device when the probability score meets a certain threshold.

Key Features and Innovation

  • Utilizes a neural network with a three-dimensional single value decomposition filter for processing multi-channel audio data.
  • Sequentially-stacked single value decomposition filter layers generate a probability score for detecting the specific word.
  • Enables efficient detection of the specific word in real-time audio streams.

Potential Applications

  • Voice-controlled devices and virtual assistants.
  • Speech recognition systems for smart homes and IoT devices.
  • Security systems for detecting specific keywords or commands.

Problems Solved

  • Efficient detection of specific keywords in streaming audio data.
  • Real-time processing of multi-channel audio for word detection.
  • Enhancing user experience with voice-activated devices.

Benefits

  • Improved accuracy in detecting specific words in spoken audio.
  • Faster response times for voice commands.
  • Enhanced usability of voice-controlled devices.

Commercial Applications

  • "Real-Time Multi-Channel Audio Keyword Detection Method" can be used in smart speakers, smart TVs, and other voice-controlled devices to enhance user experience and functionality.

Questions about Multi-Channel Audio Keyword Detection

How does the three-dimensional single value decomposition filter improve the detection of specific keywords in multi-channel audio data?

The three-dimensional single value decomposition filter allows for simultaneous processing of audio features from multiple channels, enabling more efficient analysis and detection of specific keywords.

What are the potential limitations of using a neural network for real-time keyword detection in streaming audio?

One potential limitation could be the computational resources required to process large amounts of streaming audio data in real-time, which may impact the speed and accuracy of keyword detection.


Original Abstract Submitted

A method to detect a hotword in a spoken utterance includes receiving a sequence of input frames characterizing streaming multi-channel audio. Each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. For each input frame, the method includes processing, using a three-dimensional (D) single value decomposition filter (SVDF) input layer of a memorized neural network, the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation based on a concatenation of the respective audio features. The method also includes generating, using sequentially-stacked SVDF layers, a probability score indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device.