18151540. End-to-End Streaming Keyword Spotting simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

End-to-End Streaming Keyword Spotting

Organization Name

GOOGLE LLC

Inventor(s)

Raziel Alvarez Guevara of Menlo Park CA (US)

Hyun Jin Park of Palo Alto CA (US)

End-to-End Streaming Keyword Spotting - A simplified explanation of the abstract

This abstract first appeared for US patent application 18151540 titled 'End-to-End Streaming Keyword Spotting

Simplified Explanation

The abstract describes a method for detecting a specific word or phrase (hotword) in streaming audio captured by a user device. The method uses a memorized neural network with sequentially-stacked single value decomposition filter (SVDF) layers. Each SVDF layer contains neurons with memory components and two stages of filtering. The method involves generating a probability score for the presence of the hotword, determining if it meets a detection threshold, and initiating a wake-up process on the user device for further processing.

  • The method uses a neural network with SVDF layers to detect a hotword in streaming audio.
  • Each neuron in the network has a memory component and two stages of filtering.
  • The method generates a probability score to indicate the presence of the hotword.
  • A wake-up process is initiated on the user device if the probability score meets a detection threshold.

Potential Applications

  • Voice assistants: This technology can be used in voice-activated devices like smart speakers or smartphones to detect hotwords like "Hey Siri" or "Alexa."
  • Speech recognition systems: The method can be applied in speech recognition software to trigger specific actions or commands based on hotword detection.
  • Security systems: This technology can be used in security systems to detect specific hotwords that may indicate a threat or unauthorized access.

Problems Solved

  • Accurate hotword detection: The method improves the accuracy of detecting specific hotwords in streaming audio, reducing false positives and false negatives.
  • Efficient processing: The use of SVDF layers and memory components allows for efficient processing of audio features, enabling real-time hotword detection.
  • User device wake-up: The method initiates a wake-up process on the user device only when the probability score satisfies the detection threshold, conserving device resources.

Benefits

  • Improved user experience: Accurate hotword detection ensures that voice commands or actions are triggered reliably, enhancing the user experience with voice-activated devices.
  • Enhanced security: The technology can contribute to improved security systems by detecting specific hotwords that may indicate a threat or unauthorized access.
  • Efficient resource utilization: The method optimizes the processing of audio features, reducing computational load and conserving device resources.


Original Abstract Submitted

A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.