Lg electronics inc. (20240347065). ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF simplified abstract
ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF
Organization Name
Inventor(s)
Anith Selvakumarasingam of Oshawa (CA)
ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240347065 titled 'ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF
- Simplified Explanation:**
The patent application describes a method for controlling an artificial intelligence device using video and audio samples of a user to generate audio-visual embeddings for user verification.
- Key Features and Innovation:**
- Obtaining video and audio samples of a user
- Generating visual and audio embeddings using a neural network
- Creating audio-visual embeddings based on a combination of visual and audio embeddings
- Verifying the user by comparing the generated embedding with pre-enrolled embeddings
- Training the neural network using a loss function with audio-visual embeddings
- Potential Applications:**
This technology can be used for secure user authentication, personalized user experiences, and enhanced human-computer interactions.
- Problems Solved:**
The technology addresses the need for reliable user verification in AI devices, as well as the desire for more personalized and interactive AI experiences.
- Benefits:**
The benefits include improved security, enhanced user experiences, and more efficient human-AI interactions.
- Commercial Applications:**
"AI User Verification and Personalization Technology for Enhanced User Experiences"
- Questions about AI:**
1. How does this technology improve user verification in AI devices? 2. What are the potential applications of audio-visual embeddings in AI technology?
- Frequently Updated Research:**
Stay updated on advancements in neural network training methods for audio-visual embeddings and user verification in AI devices.
Original Abstract Submitted
a method for controlling an artificial intelligence (ai) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. the method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component.