GOOGLE LLC (20250111671). MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS
MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS
Organization Name
Inventor(s)
Jingchen Feng of Los Altos CA US
Pooya Abolghasemi of Redwood Cty CA US
Gagan Bansal of Sunnyvale CA US
Yaping Zhang of Mountain View CA US
Shuchao Bi of Mountain View CA US
Claire Cui of Mountain View CA US
MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS
This abstract first appeared for US patent application 20250111671 titled 'MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS
Original Abstract Submitted
methods and systems for media item characterization based on multimodal embeddings are provided herein. a media item including a sequence of video frames is identified. a set of video embeddings representing visual features of the sequence of video frames is obtained. a set of audio embeddings representing audio features of the sequence of video frames is obtained. a set of audiovisual embeddings is generated based on the set of video embeddings and the set of audio embeddings. each of the set of audiovisual embeddings represents a visual feature and an audio feature of a respective video frame of the sequence of video frames. one or more media characteristics associated with the media item are determined based on the set of audiovisual embeddings.
- GOOGLE LLC
- Tao Zhu of Los Altos CA US
- Jiahui Yu of Bellevue WA US
- Jingchen Feng of Los Altos CA US
- Kai Chen of Brisbane CA US
- Pooya Abolghasemi of Redwood Cty CA US
- Gagan Bansal of Sunnyvale CA US
- Jieren Xu of San Jose CA US
- Hui Miao of Palo Alto CA US
- Yaping Zhang of Mountain View CA US
- Shuchao Bi of Mountain View CA US
- Yonghui Wu of Palo Alto CA US
- Claire Cui of Mountain View CA US
- Rohan Anil of Lafayette CA US
- G06V20/40
- G06F40/284
- G10L25/57
- CPC G06V20/41