Meta Platforms, Inc (20240331058). MULTIMODAL ENTITY AND COREFERENCE RESOLUTION FOR ASSISTANT SYSTEMS simplified abstract

From WikiPatents
Jump to navigation Jump to search

MULTIMODAL ENTITY AND COREFERENCE RESOLUTION FOR ASSISTANT SYSTEMS

Organization Name

Meta Platforms, Inc

Inventor(s)

Shivani Poddar of Mountain View CA (US)

Seungwhan Moon of Seattle WA (US)

Paul Anthony Crook of Newcastle WA (US)

Rajen Subba of San Carlos CA (US)

MULTIMODAL ENTITY AND COREFERENCE RESOLUTION FOR ASSISTANT SYSTEMS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240331058 titled 'MULTIMODAL ENTITY AND COREFERENCE RESOLUTION FOR ASSISTANT SYSTEMS

The patent application describes a method where a client system receives an audio input containing a coreference to a target object, accesses visual data from cameras associated with the client system, resolves the coreference to the target object, resolves the target object to a specific entity, and provides a response to the audio input with information about the specific entity.

  • Receiving audio input with coreference to a target object
  • Accessing visual data from cameras associated with the client system
  • Resolving coreference to the target object
  • Resolving the target object to a specific entity
  • Providing a response to the audio input with information about the specific entity

Potential Applications: - Augmented reality applications - Virtual assistants - Image recognition systems

Problems Solved: - Efficiently identifying and providing information about specific objects in audio-visual data - Enhancing user interaction with technology through natural language processing

Benefits: - Improved user experience - Enhanced accessibility to information - Streamlined communication with technology

Commercial Applications: Title: "Enhanced Audio-Visual Interaction Technology for Augmented Reality Applications" This technology can be used in augmented reality gaming, virtual shopping experiences, and interactive educational tools. It has implications for companies developing virtual assistants and image recognition software.

Prior Art: Prior art related to this technology may include research on natural language processing, audio-visual integration, and object recognition systems.

Frequently Updated Research: Researchers may be exploring advancements in audio-visual processing algorithms, user interface design for augmented reality applications, and the integration of artificial intelligence in interactive technologies.

Questions about the technology: 1. How does this technology improve user interaction with audio-visual data? 2. What are the potential limitations of resolving coreferences in complex audio inputs?


Original Abstract Submitted

in one embodiment, a method includes receiving, at a client system, an audio input, where the audio input comprises a coreference to a target object, accessing visual data from one or more camera associated with the client system, where the visual data comprises images portraying one or more objects, resolving the coreference to the target object from among the one or more objects, resoling the target object to a specific entity, and providing, at the client system, a response to the audio input, where the response comprises information about the specific entity.