US Patent Application 18132793. DETECTING VISUAL ATTENTION DURING USER SPEECH simplified abstract

From WikiPatents
Jump to navigation Jump to search

DETECTING VISUAL ATTENTION DURING USER SPEECH

Organization Name

Apple Inc.

Inventor(s)

Maxwell C. Horton of Santa Monica CA (US)

Stephen A. Berardi of Seattle WA (US)

Yanzi Jin of Bothell WA (US)

Sophie Lebrecht of Seattle WA (US)

Richard P. Muffoletto of Bainbridge Island WA (US)

Daniel Tormoen of Seattle WA (US)

DETECTING VISUAL ATTENTION DURING USER SPEECH - A simplified explanation of the abstract

This abstract first appeared for US patent application 18132793 titled 'DETECTING VISUAL ATTENTION DURING USER SPEECH

Simplified Explanation

The patent application describes a process that involves receiving both audio and video streams simultaneously. It then analyzes the audio and video data to determine if the user's attention is focused on an electronic device while they are speaking. If it detects that the user is looking at the device, it identifies the specific part of the audio stream that corresponds to the user's speech intended for the device. This information is then used by a digital assistant on the device to initiate a task based on the user's speech and provide an output indicating the task that has been initiated.

  • The process involves receiving audio and video streams concurrently.
  • It determines if the user is looking at an electronic device while speaking.
  • If the user's attention is on the device, it identifies the part of the audio stream that corresponds to their speech intended for the device.
  • A digital assistant on the device uses this information to initiate a task.
  • The output provided indicates the task that has been initiated.


Original Abstract Submitted

An example process includes: concurrently receiving an audio stream and a video stream; determining, based on a first portion of the audio stream received within a predetermined duration before a current time and a first portion of the video stream received within the predetermined duration before the current time, whether a visual attention of a user is directed to an electronic device while the user is speaking; and in accordance with a determination that the visual attention of the user is directed to the electronic device while the user is speaking: identifying a second portion of the audio stream to include user speech intended for the electronic device; initiating, by a digital assistant operating on the electronic device, a task based the second portion of the audio stream; and providing an output indicative of the initiated task.