NVIDIA Corporation (20250061634). AUDIO-DRIVEN FACIAL ANIMATION USING MACHINE LEARNING

From WikiPatents
Jump to navigation Jump to search

AUDIO-DRIVEN FACIAL ANIMATION USING MACHINE LEARNING

Organization Name

NVIDIA Corporation

Inventor(s)

Zhengyu Huang of Shanghai (CN)

Rui Zhang of Beijing (CN)

Tao Li of Beijing (CN)

Yingying Zhong of Shanghai (CN)

Weihua Zhang of Beijing (CN)

Junjie Lai of Beijing (CN)

Yeongho Seol of Seoul (KR)

Dmitry Korobchenko of London (GB)

Simon Yuen of Playa Vista CA (US)

AUDIO-DRIVEN FACIAL ANIMATION USING MACHINE LEARNING

This abstract first appeared for US patent application 20250061634 titled 'AUDIO-DRIVEN FACIAL ANIMATION USING MACHINE LEARNING

Original Abstract Submitted

systems and methods of the present disclosure include animating virtual avatars or agents according to input audio and one or more selected or determined emotions and/or styles. for example, a deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input. the character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. during training, the network can use a transformer-based audio encoder with locked parameters to train an associated decoder using a weighted feature vector. the network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.