Jump to content

GOOGLE LLC (20250111671). MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS

From WikiPatents

MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS

Organization Name

GOOGLE LLC

Inventor(s)

Tao Zhu of Los Altos CA US

Jiahui Yu of Bellevue WA US

Jingchen Feng of Los Altos CA US

Kai Chen of Brisbane CA US

Pooya Abolghasemi of Redwood Cty CA US

Gagan Bansal of Sunnyvale CA US

Jieren Xu of San Jose CA US

Hui Miao of Palo Alto CA US

Yaping Zhang of Mountain View CA US

Shuchao Bi of Mountain View CA US

Yonghui Wu of Palo Alto CA US

Claire Cui of Mountain View CA US

Rohan Anil of Lafayette CA US

MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS

This abstract first appeared for US patent application 20250111671 titled 'MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS

Original Abstract Submitted

methods and systems for media item characterization based on multimodal embeddings are provided herein. a media item including a sequence of video frames is identified. a set of video embeddings representing visual features of the sequence of video frames is obtained. a set of audio embeddings representing audio features of the sequence of video frames is obtained. a set of audiovisual embeddings is generated based on the set of video embeddings and the set of audio embeddings. each of the set of audiovisual embeddings represents a visual feature and an audio feature of a respective video frame of the sequence of video frames. one or more media characteristics associated with the media item are determined based on the set of audiovisual embeddings.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.