18432282. END-TO-END STREAMING KEYWORD SPOTTING simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

END-TO-END STREAMING KEYWORD SPOTTING

Organization Name

GOOGLE LLC

Inventor(s)

Raziel Alvarez Guevara of Menlo Park CA (US)

Hyun Jin Park of Palo Alto CA (US)

END-TO-END STREAMING KEYWORD SPOTTING - A simplified explanation of the abstract

This abstract first appeared for US patent application 18432282 titled 'END-TO-END STREAMING KEYWORD SPOTTING

Simplified Explanation

The method described in the abstract is for detecting a hotword in streaming audio using a neural network with SVDF layers. The network filters audio features in two stages and generates a probability score for the presence of the hotword.

  • The method uses a neural network with SVDF layers to detect a hotword in streaming audio.
  • Each neuron in the network includes a memory component and two filtering stages for audio features.
  • The probability score generated by the network is used to determine if a hotword is present in the audio stream.

Potential Applications

This technology can be applied in:

  • Voice-activated devices
  • Speech recognition systems
  • Virtual assistants

Problems Solved

This technology helps in:

  • Improving accuracy in detecting specific keywords in audio streams
  • Enhancing user experience with voice-controlled devices
  • Enabling hands-free operation of devices

Benefits

The benefits of this technology include:

  • Efficient detection of hotwords in streaming audio
  • Quick response time in initiating actions based on detected keywords
  • Enhanced user interaction with devices through voice commands

Potential Commercial Applications

This technology can be commercially benefit:

  • Smart home devices
  • Automotive voice control systems
  • Customer service chatbots

Possible Prior Art

One possible prior art for this technology could be the use of deep learning models for speech recognition and keyword detection in audio streams.

Unanswered Questions

How does this technology compare to existing methods for hotword detection in streaming audio?

This technology uses a neural network with SVDF layers for hotword detection, which may offer improved accuracy and efficiency compared to traditional methods.

What are the limitations of this technology in real-world applications?

The limitations of this technology may include the need for significant computational resources for real-time processing and potential challenges in adapting to different accents or languages.


Original Abstract Submitted

A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.