HIERARCHICAL AUDIO-VISUAL FEATURE FUSING METHOD FOR AUDIO-VISUAL QUESTION ANSWERING AND PRODUCT

Organization Name

Inventor(s)

HIERARCHICAL AUDIO-VISUAL FEATURE FUSING METHOD FOR AUDIO-VISUAL QUESTION ANSWERING AND PRODUCT - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240046628 titled 'HIERARCHICAL AUDIO-VISUAL FEATURE FUSING METHOD FOR AUDIO-VISUAL QUESTION ANSWERING AND PRODUCT

Simplified Explanation

The abstract describes a method for audio-visual question answering by fusing audio and video embeddings with a baseline model at different stages in a hierarchical feature fusing process. This process generates answer probability distributions which are then added and averaged for hierarchical integration to produce a final answer.

The method involves fusing audio embedding in a video clip with a baseline model, as well as video embedding and question embedding at different stages in a hierarchical feature fusing process.
The hierarchical feature fusing process includes an early stage, a middle stage, and a late stage.
The method generates three answer probability distributions: a first answer probability distribution, a second answer probability distribution, and a third answer probability distribution.
The answer probability distributions are added based on preset weights and then averaged for hierarchical integration.
The final answer is generated through the hierarchical integration of the answer probability distributions.

Potential applications of this technology:

Audio-visual question answering systems
Video analysis and understanding
Natural language processing and understanding
Human-computer interaction

Problems solved by this technology:

Improving the accuracy and performance of audio-visual question answering systems
Enhancing the integration of audio and visual information in video analysis
Addressing the challenges of understanding and processing natural language queries in multimedia applications

Benefits of this technology:

Improved accuracy and reliability in answering audio-visual questions
Enhanced understanding and analysis of audio and visual information in videos
More efficient and effective human-computer interaction in multimedia applications

Original Abstract Submitted

a hierarchical audio-visual feature fusing method for audio-visual question answering and a product relate to the field of audio-visual question answering. by fusing audio embedding in an input video clip with a baseline model as well as video embedding and question embedding respectively at an early stage, a middle stage and a late stage in a hierarchical feature fusing process, a first answer probability distribution, a second answer probability distribution and a third answer probability distribution are obtained, and the answer probability distributions are added based on preset weights, and then averaged for hierarchical integration to generate a final answer.

20240046628. HIERARCHICAL AUDIO-VISUAL FEATURE FUSING METHOD FOR AUDIO-VISUAL QUESTION ANSWERING AND PRODUCT simplified abstract (TSINGHUA UNIVERSITY)

Contents

HIERARCHICAL AUDIO-VISUAL FEATURE FUSING METHOD FOR AUDIO-VISUAL QUESTION ANSWERING AND PRODUCT

Organization Name

Inventor(s)