Hyundai motor company (20240346730). FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE simplified abstract
Contents
FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE
Organization Name
Inventor(s)
You Shin Lim of Yongin-si (KR)
FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240346730 titled 'FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE
Simplified Explanation: This patent application describes a method and device for generating a controllable talking face image. The method involves receiving a source image, driving images, and input audio, encoding them into various latent codes, and then generating a talking face image using a generative adversarial network (GAN).
- The method involves receiving a source image, driving images, and input audio.
- Encoding the source image and driving images into latent codes using visual and audio encoders.
- Mapping the source latent code to a canonical space and combining the driving latent code with the audio latent code to generate a motion code.
- Combining the canonical code with the motion code to create a multimodal fused latent code.
- Generating a talking face image by transferring the multimodal fused latent code to a generative adversarial network (GAN).
Potential Applications: 1. Creating realistic and controllable talking face images for entertainment purposes. 2. Developing advanced video editing tools for content creators. 3. Enhancing virtual communication platforms with lifelike avatars.
Problems Solved: 1. Generating realistic talking face images with controllable features. 2. Integrating audio and visual information seamlessly in image generation. 3. Improving the quality and accuracy of facial animation in digital content.
Benefits: 1. Enhanced user engagement through lifelike avatars. 2. Streamlined content creation process for video editors. 3. Improved communication experiences in virtual environments.
Commercial Applications: The technology can be utilized in video editing software, virtual reality applications, and online communication platforms to enhance user experiences and create engaging content.
Questions about Face Image Generation: 1. How does the method ensure the synchronization of audio and visual elements in generating talking face images? 2. What are the potential limitations of this technology in terms of scalability and real-time applications?
Original Abstract Submitted
provided are face image generation method and device for generating a controllable talking face image. the method includes: a face image generation method for generating a controllable talking face image, the method comprising: receiving a source image and a series of driving images, sampled from the same video, and input audio; acquiring a style latent code including a source latent code and a driving latent code by encoding the source image and the series of driving images into a visual space by a visual encoder; acquiring an audio feature including an audio latent code by encoding the input audio by an audio encoder; acquiring a canonical code by mapping the source latent code to a canonical space by a canonical encoder; acquiring a motion code by combining the driving latent code with the audio latent code, and mapping the combined code to a multimodal motion space by a multimodal motion encoder; acquiring a multimodal fused latent code by combining the canonical code with the motion code; and generating a talking face image by transferring the multimodal fused latent code to a generative adversarial network (gan).