18383314. AUTOMATED ASSISTANT INTERACTION PREDICTION USING FUSION OF VISUAL AND AUDIO INPUT simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

AUTOMATED ASSISTANT INTERACTION PREDICTION USING FUSION OF VISUAL AND AUDIO INPUT

Organization Name

GOOGLE LLC

Inventor(s)

Tuan Nguyen of San Jose CA (US)

Yuan Yuan of Redwood City CA (US)

AUTOMATED ASSISTANT INTERACTION PREDICTION USING FUSION OF VISUAL AND AUDIO INPUT - A simplified explanation of the abstract

This abstract first appeared for US patent application 18383314 titled 'AUTOMATED ASSISTANT INTERACTION PREDICTION USING FUSION OF VISUAL AND AUDIO INPUT

Simplified Explanation

The patent application describes techniques for detecting and enrolling new "hot commands" that can be used to trigger actions by an automated assistant without explicit invocation.

  • Automated assistant can transition from limited listening to full speech recognition state in response to a trigger event.
  • Speech recognition processing is performed on a spoken command to generate a textual command.
  • Textual command is enrolled as a hot command if it meets a frequency threshold in a corpus of textual commands.
  • Subsequent utterance of a semantically consistent textual command can trigger a responsive action by the automated assistant without explicit invocation.
    • Potential Applications:**

- Virtual assistants - Smart home devices - Voice-controlled applications

    • Problems Solved:**

- Streamlining interaction with automated assistants - Reducing the need for explicit commands - Improving user experience with voice-controlled systems

    • Benefits:**

- Enhanced usability of automated assistants - Faster and more intuitive interaction - Increased efficiency in performing tasks via voice commands


Original Abstract Submitted

Techniques are described herein for detecting and/or enrolling (or commissioning) new “hot commands” that are useable to cause an automated assistant to perform responsive action(s) without having to be first explicitly invoked. In various implementations, an automated assistant may be transitioned from a limited listening state into a full speech recognition state in response to a trigger event. While in the full speech recognition state, the automated assistant may receive and perform speech recognition processing on a spoken command from a user to generate a textual command. The textual command may be determined to satisfy a frequency threshold in a corpus of textual commands. Consequently, data indicative of the textual command may be enrolled as a hot command. Subsequent utterance of another textual command that is semantically consistent with the textual command may trigger performance of a responsive action by the automated assistant, without requiring explicit invocation.