Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor

Organization Name

NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventor(s)

Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240054711 titled 'Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor

Simplified Explanation

The present disclosure relates to a method, model, and training method for audio-driven character lip sync. The method involves obtaining a target dynamic image by processing a character image of a target character and speech as image-audio data. This data is then mixed with auxiliary data for training. In scenarios where a large amount of sample data is needed, an auxiliary video of another character speaking is used to obtain the auxiliary data. The model is trained using a preset ratio of the auxiliary data and other data. This approach improves the training process for synthetic lip sync actions by eliminating unrelated parts. It also resolves the problem of requiring a large amount of sample data during training.

Method for audio-driven character lip sync
Model for audio-driven character lip sync
Training method for audio-driven character lip sync
Obtaining a target dynamic image by processing character image and speech data
Mixing image-audio data with auxiliary data for training
Using an auxiliary video as an additional source of data for training
Improving the training process for synthetic lip sync actions
Resolving the problem of requiring a large amount of sample data during training

Potential Applications

Animation and gaming industry for realistic character lip sync
Virtual reality and augmented reality applications for immersive experiences
Film and television industry for post-production editing and dubbing

Problems Solved

Requirement of a large amount of sample data during the training process
Inclusion of unrelated parts in the synthetic lip sync training process

Benefits

Improved accuracy and realism in character lip sync
Reduced time and resources required for training
Enhanced user experience in animation, gaming, and virtual reality applications

Original Abstract Submitted

embodiments of the present disclosure provide a method for audio-driven character lip sync, a model for audio-driven character lip sync, and a training method therefor. a target dynamic image is obtained by acquiring a character image of a target character and speech for generating a target dynamic image, processing the character image and the speech as image-audio data that may be trained, respectively, and mixing the image-audio data with auxiliary data for training. when a large amount of sample data needs to be obtained for training in different scenarios, a video when another character speaks is used as an auxiliary video for processing, so as to obtain the auxiliary data. the auxiliary data, which replaces non-general sample data, and other data are input into a model in a preset ratio for training. the auxiliary data may improve a process of training a synthetic lip sync action of the model, so that there are no parts unrelated to the synthetic lip sync action during the training process. in this way, a problem that a large amount of sample data is required during the training process is resolved.

20240054711. Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor simplified abstract (NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.)

Contents

Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor

Organization Name

Inventor(s)