TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

Organization Name

Inventor(s)

TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

This abstract first appeared for US patent application 18527668 titled 'TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

Original Abstract Submitted

Methods, systems, apparatuses, devices, and computer program products are described. A system may input a first audio stream (e.g., audio recording) and a corresponding text sting into a machine learning model. The first audio stream and the text string may correspond to a first identity (e.g., person). Based on an output of the machine learning model, the system may generate a second audio stream associated with a second identity and mimics the first audio steam. For example, the second audio stream may be a generated recording of the second identity speaking the first text string. In addition, the system may generate a video depicting the second identity speaking the first text string (e.g., the second audio stream) based on combining the second audio stream with some image or previous video of the second identity. For example, the system may generate the video based on generating a head motion sequence.

18527668. TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION (Salesforce, Inc.)

Contents

TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

Organization Name

Inventor(s)

TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools