Adobe Inc. (20240257496). DETERMINING AUDIO AND VIDEO REPRESENTATIONS USING SELF-SUPERVISED LEARNING
DETERMINING AUDIO AND VIDEO REPRESENTATIONS USING SELF-SUPERVISED LEARNING
Organization Name
Inventor(s)
DETERMINING AUDIO AND VIDEO REPRESENTATIONS USING SELF-SUPERVISED LEARNING
This abstract first appeared for US patent application 20240257496 titled 'DETERMINING AUDIO AND VIDEO REPRESENTATIONS USING SELF-SUPERVISED LEARNING
Original Abstract Submitted
embodiments are disclosed for training a system to generate audio and video representations using self-supervised learning. the method may include receiving a video signal including an audio component and a video component. a first machine learning model is trained to determine a representation of the audio component using a contrastive learning task and a temporal learning task. a second machine learning model to determine a representation of the video component using the contrastive learning task and the temporal learning task. by training the machine learning models using both contrastive learning tasks and temporal learning tasks, the machine learning models learn short term features, long term features, and semantic features of input data.