VIDEO GENERATION USING FRAME-WISE TOKEN EMBEDDINGS

Organization Name

ADOBE INC.

Inventor(s)

Seoung Wug Oh of San Jose CA US

Mingi Kwon of San Jose CA US

Joon-Young Lee of San Jose CA US

Yang Zhou of Mountain View CA US

Difan Liu of San Jose CA US

Haoran Cai of Mercer Island WA US

Baqiao Liu of Champaign IL US

Feng Liu of Beaverton OR US

VIDEO GENERATION USING FRAME-WISE TOKEN EMBEDDINGS

This abstract first appeared for US patent application 18894443 titled 'VIDEO GENERATION USING FRAME-WISE TOKEN EMBEDDINGS

Original Abstract Submitted

A method, apparatus, non-transitory computer readable medium, and system for generating synthetic videos includes obtaining an input prompt describing a video scene. The embodiments then generate a plurality of frame-wise token embeddings corresponding to a sequence of video frames, respectively, based on the input prompt. Subsequently, embodiments generate, using a video generation model, a synthesized video depicting the video scene. The synthesized includes a plurality of images corresponding to the sequence of video frames.