20240046925. DYNAMICALLY DETERMINING WHETHER TO PERFORM CANDIDATE AUTOMATED ASSISTANT ACTION DETERMINED FROM SPOKEN UTTERANCE simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

DYNAMICALLY DETERMINING WHETHER TO PERFORM CANDIDATE AUTOMATED ASSISTANT ACTION DETERMINED FROM SPOKEN UTTERANCE

Organization Name

Google LLC

Inventor(s)

Konrad Miller of Zurich (CH)

Ágoston Weisz of Zurich (CH)

Herbert Jordan of Zurich (CH)

DYNAMICALLY DETERMINING WHETHER TO PERFORM CANDIDATE AUTOMATED ASSISTANT ACTION DETERMINED FROM SPOKEN UTTERANCE - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240046925 titled 'DYNAMICALLY DETERMINING WHETHER TO PERFORM CANDIDATE AUTOMATED ASSISTANT ACTION DETERMINED FROM SPOKEN UTTERANCE

Simplified Explanation

The abstract of this patent application describes a system that can automatically convert speech from audio data captured by a microphone on an assistant device into text. This text is then processed to generate potential actions that the automated assistant can take based on the spoken command. The system also determines whether to automatically perform these actions or suppress them based on various factors, such as the features of the action and the current state of the assistant device's environment.

  • The system performs automatic speech recognition (ASR) on audio data captured by the assistant device's microphone.
  • The ASR text predicts the spoken utterance captured in the audio data.
  • Candidate automated assistant actions are generated based on the processed ASR text.
  • The system determines whether to automatically perform or suppress each candidate action based on the features of the action and the current environment state.
  • The determination takes into account both action features and environment features.

Potential applications of this technology:

  • Voice-controlled assistants: This technology can be used in voice-controlled assistant devices, such as smart speakers or virtual assistants on smartphones, to accurately convert spoken commands into text and perform corresponding actions.
  • Home automation: The system can be integrated with home automation systems, allowing users to control various devices and appliances using voice commands.
  • Automotive assistants: This technology can be implemented in automotive assistants, enabling drivers to control various functions of their vehicles through voice commands.

Problems solved by this technology:

  • Accurate speech recognition: The system improves the accuracy of speech recognition by processing audio data captured by the assistant device's microphone.
  • Efficient automation: By automatically generating potential assistant actions based on spoken commands, the system streamlines the automation process and reduces the need for manual input.
  • Context-awareness: The system takes into account the current environment state to determine whether to perform or suppress automated assistant actions, enhancing the assistant's ability to respond appropriately to different situations.

Benefits of this technology:

  • Enhanced user experience: Users can interact with assistant devices more naturally and effortlessly through voice commands, improving the overall user experience.
  • Time-saving: By automating assistant actions based on spoken commands, users can perform tasks more quickly without the need for manual input.
  • Improved accuracy: The system's ASR capabilities and processing of action and environment features contribute to improved accuracy in understanding and responding to user commands.


Original Abstract Submitted

implementations perform, independent of any explicit assistant invocation input(s), automatic speech recognition (asr) on audio data, that is detected via microphone(s) of an assistant device, to generate asr text that predicts a spoken utterance that is captured in the audio data. the asr text is processed and candidate automated assistant action(s) that correspond to the command, if any, are generated. for each of any candidate automated assistant action(s), it is determined whether to (a) cause automatic performance of the automated assistant action responsive to the spoken utterance or, instead, (b) suppress any automatic performance of the automated assistant action responsive to the spoken utterance. such determination can be made based on processing both (i) action feature(s) for the candidate automated assistant action; and (ii) environment feature(s) that each reflects a corresponding current value for a corresponding dynamic state of an environment of the assistant device.