Lemon Inc. (20250104701). IMPROVEMENT OF AUDIO-VISUAL QUESTION ANSWERING
IMPROVEMENT OF AUDIO-VISUAL QUESTION ANSWERING
Organization Name
Inventor(s)
Peng Zhang of Los Angeles CA US
Xiulong Liu of Culver City CA US
Zhikang Dong of Culver City CA US
IMPROVEMENT OF AUDIO-VISUAL QUESTION ANSWERING
This abstract first appeared for US patent application 20250104701 titled 'IMPROVEMENT OF AUDIO-VISUAL QUESTION ANSWERING
Original Abstract Submitted
the present disclosure describes techniques for improving audio-visual question answering. a machine learning model is configured for audio-visual question answering (avqa). the machine learning model comprises a first sub-model configured to capture semantic audio information and output an audio spatial feature map x. the machine learning model comprises a second sub-model configured to extract visual features xand audio features xand further configured to obtain a question vector x. the machine learning model comprises a third sub-model configured to capture audio-visual correspondence at a granular level. a balanced avqa dataset is created. the balanced avqa dataset comprises balanced answer distribution in each question category. the machine learning model is trained to answer questions about visual objects, sounds, and their associations in videos using at least a subset of the balanced avaq dataset.