KNOWLEDGE-DRIVEN SCENE PRIORS FOR SEMANTIC AUDIO-VISUAL EMBODIED NAVIGATION

Organization Name

Inventor(s)

KNOWLEDGE-DRIVEN SCENE PRIORS FOR SEMANTIC AUDIO-VISUAL EMBODIED NAVIGATION

This abstract first appeared for US patent application 20250022296 titled 'KNOWLEDGE-DRIVEN SCENE PRIORS FOR SEMANTIC AUDIO-VISUAL EMBODIED NAVIGATION

Original Abstract Submitted

a method of controlling navigation of a device in an environment using machine learning (ml) models includes receiving visual and audio observation data of the environment as sensed by the device, determining classification scores for objects and regions in the environment based on the visual and audio observation data, encoding visual information based on the classification scores, determining audio-semantic feature embeddings based at least in part on the classification scores, the audio-semantic feature embeddings indicating spatial relationships between objects in the environment, between regions in the environment, and between objects and regions in the environment, and determining and outputting, based on the encoded visual information and the audio-semantic feature embeddings, a state representation corresponding to a state of the device within the environment.