GOOGLE LLC (20240242711). End-to-End Streaming Keyword Spotting simplified abstract

From WikiPatents
Jump to navigation Jump to search

End-to-End Streaming Keyword Spotting

Organization Name

GOOGLE LLC

Inventor(s)

Raziel Alvarez Guevara of Menlo Park CA (US)

Hyun Jin Park of Palo Alto CA (US)

Patrick Violette of Mountain View CA (US)

End-to-End Streaming Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240242711 titled 'End-to-End Streaming Keyword Spotting

The abstract of the patent application describes a method for training hotword detection using a memorized neural network with encoder and decoder components.

  • The method involves receiving a training input audio sequence containing a hotword that triggers a wake-up process on a device.
  • The training input audio sequence is fed into an encoder and decoder of a memorized neural network, each consisting of sequentially-stacked single value decomposition filter (SVDF) layers.
  • Logits are generated at each stage of the encoder and decoder based on the training input audio sequence.
  • The method includes smoothing the logits, determining max pooling loss from a probability distribution, and optimizing the encoder and decoder based on all max pooling losses associated with the training input audio sequence.

Potential Applications: - Hotword detection in smart devices - Voice-controlled systems - Speech recognition technology

Problems Solved: - Improving accuracy and efficiency of hotword detection - Enhancing wake-up processes on devices - Optimizing neural network training for audio sequences

Benefits: - Faster and more accurate hotword detection - Improved user experience with voice-activated devices - Enhanced performance of speech recognition systems

Commercial Applications: Title: Advanced Hotword Detection Technology for Smart Devices This technology can be utilized in smart speakers, virtual assistants, and other voice-controlled devices to enhance user interaction and improve overall performance. The market implications include increased demand for smart devices with advanced voice recognition capabilities.

Questions about Hotword Detection Technology: 1. How does this method improve the accuracy of hotword detection in comparison to existing technologies? - The method utilizes a memorized neural network with encoder and decoder components, which allows for more efficient training and optimization of hotword detection algorithms.

2. What are the potential challenges in implementing this technology in real-world applications? - Some challenges may include fine-tuning the neural network parameters for different hotwords and optimizing the system for various environmental conditions.


Original Abstract Submitted

a method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. the method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (svdf) layers. the method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. for each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.