Google llc (20240242711). End-to-End Streaming Keyword Spotting simplified abstract

From WikiPatents
Jump to navigation Jump to search

End-to-End Streaming Keyword Spotting

Organization Name

google llc

Inventor(s)

Raziel Alvarez Guevara of Menlo Park CA (US)

Hyun Jin Park of Palo Alto CA (US)

Patrick Violette of Mountain View CA (US)

End-to-End Streaming Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240242711 titled 'End-to-End Streaming Keyword Spotting

Simplified Explanation

The patent application describes a method for training hotword detection using a neural network with single value decomposition filter layers.

  • The method involves inputting an audio sequence containing a hotword into the neural network.
  • The neural network includes an encoder and a decoder with single value decomposition filter layers.
  • Logits are generated at each layer based on the input audio sequence.
  • The method includes smoothing the logits, calculating max pooling loss, and optimizing the network based on the loss.

Key Features and Innovation

  • Training hotword detection using a neural network with single value decomposition filter layers.
  • Sequentially-stacked encoder and decoder in the neural network.
  • Logit generation and optimization based on the input audio sequence.
  • Smoothing and max pooling loss calculation for each layer.

Potential Applications

The technology can be used in devices for wake-up processes triggered by hotwords, such as smart speakers, virtual assistants, and voice-controlled devices.

Problems Solved

  • Efficient training of hotword detection systems.
  • Improved accuracy and performance in recognizing hotwords.
  • Optimization of neural networks for specific audio sequences.

Benefits

  • Enhanced wake-up processes on devices.
  • Increased accuracy in detecting hotwords.
  • Improved user experience with voice-controlled devices.

Commercial Applications

  • "Neural Network Training for Hotword Detection" can be utilized in smart speakers, virtual assistants, and other voice-controlled devices to enhance user interaction and performance in recognizing specific commands.

Questions about Neural Network Training for Hotword Detection

How does the method improve the accuracy of hotword detection?

The method utilizes a neural network with single value decomposition filter layers to optimize the training process and enhance the accuracy of detecting hotwords.

What are the potential applications of this technology beyond wake-up processes on devices?

The technology can also be applied in security systems, automated transcription services, and other audio recognition applications to improve performance and accuracy.


Original Abstract Submitted

a method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. the method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (svdf) layers. the method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. for each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.