18619156. End-to-End Streaming Keyword Spotting simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

End-to-End Streaming Keyword Spotting

Organization Name

GOOGLE LLC

Inventor(s)

Raziel Alvarez Guevara of Menlo Park CA (US)

Hyun Jin Park of Palo Alto CA (US)

Patrick Violette of Mountain View CA (US)

End-to-End Streaming Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 18619156 titled 'End-to-End Streaming Keyword Spotting

The abstract describes a method for training hotword detection using a memorized neural network with single value decomposition filter layers.

  • The method involves receiving a training input audio sequence with a hotword, feeding it into an encoder and decoder of a neural network, and generating logits based on the input.
  • Each logit is smoothed, a max pooling loss is determined from a probability distribution, and the encoder and decoder are optimized based on all max pooling losses.
  • The use of single value decomposition filter layers in the encoder and decoder is a key feature of this innovation.
  • This method aims to improve the accuracy and efficiency of hotword detection systems on devices.

Potential Applications: - Speech recognition systems - Virtual assistants - Smart home devices

Problems Solved: - Enhancing hotword detection accuracy - Improving wake-up processes on devices

Benefits: - Faster and more accurate hotword detection - Enhanced user experience with voice-controlled devices

Commercial Applications: Title: Advanced Hotword Detection Training Method for Smart Devices This technology can be utilized in smart speakers, smartphones, and other voice-controlled devices to enhance the performance of hotword detection systems, leading to improved user satisfaction and market competitiveness.

Questions about Hotword Detection Training Method: 1. How does this method improve the efficiency of hotword detection on devices?

  This method utilizes a memorized neural network with single value decomposition filter layers to optimize the encoder and decoder, resulting in faster and more accurate hotword detection.

2. What are the potential applications of this technology beyond smart devices?

  This technology can also be applied in security systems, automotive voice control, and industrial automation for improved voice recognition capabilities.


Original Abstract Submitted

A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.