20240054711. Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor simplified abstract (NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor

Organization Name

NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventor(s)

Huapeng Sima of Nanjing (CN)

Zheng Liao of Nanjing (CN)

Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240054711 titled 'Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor

Simplified Explanation

The present disclosure relates to a method, model, and training method for audio-driven character lip sync. The method involves obtaining a target dynamic image by processing a character image of a target character and speech as image-audio data. This data is then mixed with auxiliary data for training. In scenarios where a large amount of sample data is needed, an auxiliary video of another character speaking is used to obtain the auxiliary data. The model is trained using a preset ratio of the auxiliary data and other data. This approach improves the training process for synthetic lip sync actions by eliminating unrelated parts. It also resolves the problem of requiring a large amount of sample data during training.

  • Method for audio-driven character lip sync
  • Model for audio-driven character lip sync
  • Training method for audio-driven character lip sync
  • Obtaining a target dynamic image by processing character image and speech data
  • Mixing image-audio data with auxiliary data for training
  • Using an auxiliary video as an additional source of data for training
  • Improving the training process for synthetic lip sync actions
  • Resolving the problem of requiring a large amount of sample data during training

Potential Applications

  • Animation and gaming industry for realistic character lip sync
  • Virtual reality and augmented reality applications for immersive experiences
  • Film and television industry for post-production editing and dubbing

Problems Solved

  • Requirement of a large amount of sample data during the training process
  • Inclusion of unrelated parts in the synthetic lip sync training process

Benefits

  • Improved accuracy and realism in character lip sync
  • Reduced time and resources required for training
  • Enhanced user experience in animation, gaming, and virtual reality applications


Original Abstract Submitted

embodiments of the present disclosure provide a method for audio-driven character lip sync, a model for audio-driven character lip sync, and a training method therefor. a target dynamic image is obtained by acquiring a character image of a target character and speech for generating a target dynamic image, processing the character image and the speech as image-audio data that may be trained, respectively, and mixing the image-audio data with auxiliary data for training. when a large amount of sample data needs to be obtained for training in different scenarios, a video when another character speaks is used as an auxiliary video for processing, so as to obtain the auxiliary data. the auxiliary data, which replaces non-general sample data, and other data are input into a model in a preset ratio for training. the auxiliary data may improve a process of training a synthetic lip sync action of the model, so that there are no parts unrelated to the synthetic lip sync action during the training process. in this way, a problem that a large amount of sample data is required during the training process is resolved.