18527668. TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION (Salesforce, Inc.)
Contents
TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION
Organization Name
Inventor(s)
Zhichao Wang of Cambridge MA (US)
Keld Lundgaard of Cambridge MA (US)
Mengyu Dai of Cambridge MA (US)
TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION
This abstract first appeared for US patent application 18527668 titled 'TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION
Original Abstract Submitted
Methods, systems, apparatuses, devices, and computer program products are described. A system may input a first audio stream (e.g., audio recording) and a corresponding text sting into a machine learning model. The first audio stream and the text string may correspond to a first identity (e.g., person). Based on an output of the machine learning model, the system may generate a second audio stream associated with a second identity and mimics the first audio steam. For example, the second audio stream may be a generated recording of the second identity speaking the first text string. In addition, the system may generate a video depicting the second identity speaking the first text string (e.g., the second audio stream) based on combining the second audio stream with some image or previous video of the second identity. For example, the system may generate the video based on generating a head motion sequence.