Samsung electronics co., ltd. (20240160842). CONFIDENCE-BASED INTERACTABLE NEURAL-SYMBOLIC VISUAL QUESTION ANSWERING simplified abstract

From WikiPatents
Revision as of 02:52, 23 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

CONFIDENCE-BASED INTERACTABLE NEURAL-SYMBOLIC VISUAL QUESTION ANSWERING

Organization Name

samsung electronics co., ltd.

Inventor(s)

Yajie Bao of Sandy Springs GA (US)

Tianwei Xing of Santa Clara CA (US)

Xun Chen of Fremont CA (US)

CONFIDENCE-BASED INTERACTABLE NEURAL-SYMBOLIC VISUAL QUESTION ANSWERING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240160842 titled 'CONFIDENCE-BASED INTERACTABLE NEURAL-SYMBOLIC VISUAL QUESTION ANSWERING

Simplified Explanation

The patent application describes a method for visual question answering (VQA) using artificial intelligence models to analyze images and questions to generate answers.

  • Obtaining an image and a corresponding question
  • Generating feature predictions about objects in the image using an AI scene perception model
  • Generating symbolic programs and program confidence scores based on the question using an AI question parsing model
  • Selecting a symbolic program with the highest confidence score
  • Executing the selected program by performing logic operations on the feature predictions
  • Determining a natural language answer based on the result of the logic operations

Potential Applications

This technology could be applied in various fields such as virtual assistants, image recognition systems, and educational platforms.

Problems Solved

This technology addresses the challenge of understanding and answering questions based on visual content, improving the accuracy and efficiency of AI systems.

Benefits

The method enhances the capabilities of AI systems to interpret and respond to visual information, leading to more advanced and interactive applications.

Potential Commercial Applications

The technology could be utilized in smart home devices, e-commerce platforms, and online customer service systems to provide automated responses to visual queries.

Possible Prior Art

Prior art in the field of computer vision and natural language processing may include similar methods for image analysis and question answering, but the specific combination of AI models and logic operations as described in this patent application may be novel.

Unanswered Questions

How does this technology compare to existing VQA methods in terms of accuracy and efficiency?

This technology aims to improve accuracy and efficiency by combining AI scene perception and question parsing models, but further research and testing are needed to compare its performance with existing VQA methods.

What are the potential limitations or challenges of implementing this method in real-world applications?

The complexity of executing symbolic programs and logic operations on feature predictions may pose challenges in real-time processing and scalability, requiring optimization and resource management strategies for practical use.


Original Abstract Submitted

a method of performing visual question answering (vqa), including: obtaining an image and a question corresponding to the image; generating a plurality of feature predictions about at least one object included in the image by providing the image to an artificial intelligence (ai) scene perception model; generating a plurality of symbolic programs and a plurality of program confidence scores by providing the question to an ai question parsing model; selecting a symbolic program associated with a program confidence score which is highest among the plurality of program confidence scores; executing the selected symbolic program by performing a set of logic operations included in the selected symbolic program on the plurality of feature predictions; and determining a natural language answer to the question based on a result of the set of logic operations.