18243360. INFERRING INTENT FROM POSE AND SPEECH INPUT simplified abstract (Snap Inc.)

From WikiPatents
Jump to navigation Jump to search

INFERRING INTENT FROM POSE AND SPEECH INPUT

Organization Name

Snap Inc.

Inventor(s)

Matan Zohar of Rishon LeZion (IL)

Yanli Zhao of Longon (GB)

Brian Fulkerson of London (GB)

Itamar Berger of Hod Hasharon (IL)

INFERRING INTENT FROM POSE AND SPEECH INPUT - A simplified explanation of the abstract

This abstract first appeared for US patent application 18243360 titled 'INFERRING INTENT FROM POSE AND SPEECH INPUT

Simplified Explanation

The patent application describes a system and method for performing augmented reality (AR) operations based on the pose of a person depicted in an image and their speech input. Here are the key points:

  • The system includes a computer-readable storage medium and a program for processing the operations.
  • The method starts by receiving an image that shows a person.
  • The system then identifies the skeletal joints of the person in the image.
  • Based on the positioning of these skeletal joints, the system determines the pose of the person.
  • Next, the system receives speech input from the person, which includes a request to perform an AR operation and an ambiguous intent.
  • The system uses the pose of the person to discern the ambiguous intent of the speech input.
  • Finally, the system performs the AR operation based on the discerned intent and the pose of the person.

Potential applications of this technology:

  • AR gaming: The system can interpret the pose and speech input of a player to perform specific actions or trigger events in an AR game.
  • Virtual shopping: By understanding the intent of a person's speech input and their pose, the system can provide relevant AR product information or virtual try-on experiences.
  • Fitness and health: The system can analyze a person's pose and speech input to guide them through exercise routines or provide personalized health advice.

Problems solved by this technology:

  • Ambiguous speech input: By using the person's pose as context, the system can better understand the intended meaning of ambiguous speech input, improving the accuracy of AR operations.
  • Enhanced user experience: The system combines pose recognition and speech input to create a more intuitive and interactive AR experience, reducing the need for complex user interfaces.

Benefits of this technology:

  • Improved accuracy: By considering the pose of the person, the system can better discern the intended meaning of their speech input, leading to more accurate AR operations.
  • Enhanced user interaction: The combination of pose recognition and speech input allows for more natural and seamless interactions with AR systems.
  • Personalized experiences: The system can tailor AR operations based on the individual's pose and speech input, providing personalized and context-aware experiences.


Original Abstract Submitted

Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing at least one program, and a method for performing operations comprising receiving an image that depicts a person, identifying a set of skeletal joints of the person and identifying a pose of the person depicted in the image based on positioning of the set of skeletal joints. The operations also include receiving speech input comprising a request to perform an AR operation and an ambiguous intent, discerning the ambiguous intent of the speech input based on the pose of the person depicted in the image and in response to receiving the speech input, performing the AR operation based on discerning the ambiguous intent of the speech input based on the pose of the person depicted in the image.