18527668. TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION (Salesforce, Inc.)

From WikiPatents
Revision as of 07:46, 19 December 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

Organization Name

Salesforce, Inc.

Inventor(s)

Zhichao Wang of Cambridge MA (US)

Keld Lundgaard of Cambridge MA (US)

Mengyu Dai of Cambridge MA (US)

TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION

This abstract first appeared for US patent application 18527668 titled 'TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION



Original Abstract Submitted

Methods, systems, apparatuses, devices, and computer program products are described. A system may input a first audio stream (e.g., audio recording) and a corresponding text sting into a machine learning model. The first audio stream and the text string may correspond to a first identity (e.g., person). Based on an output of the machine learning model, the system may generate a second audio stream associated with a second identity and mimics the first audio steam. For example, the second audio stream may be a generated recording of the second identity speaking the first text string. In addition, the system may generate a video depicting the second identity speaking the first text string (e.g., the second audio stream) based on combining the second audio stream with some image or previous video of the second identity. For example, the system may generate the video based on generating a head motion sequence.