Hyundai Motor Company (20240346730). FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE simplified abstract
Contents
- 1 FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Key Features and Innovation
- 1.6 Potential Applications
- 1.7 Problems Solved
- 1.8 Benefits
- 1.9 Commercial Applications
- 1.10 Prior Art
- 1.11 Frequently Updated Research
- 1.12 Questions about Face Image Generation
- 1.13 Original Abstract Submitted
FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE
Organization Name
Inventor(s)
You Shin Lim of Yongin-si (KR)
FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240346730 titled 'FACE IMAGE GENERATION METHOD AND DEVICE FOR GENERATING FULLY-CONTROLLABLE TALKING FACE
Simplified Explanation
The patent application describes a method and device for generating a controllable talking face image. This involves encoding source images, driving images, and input audio to create a talking face image using a generative adversarial network.
- Receiving source images, driving images, and input audio
- Encoding source and driving images to create a visual space
- Encoding input audio to create an audio feature
- Mapping source latent code to a canonical space
- Combining driving and audio latent codes to create a motion code
- Generating a talking face image using a generative adversarial network
Key Features and Innovation
- Generation of controllable talking face images
- Encoding of source and driving images into a visual space
- Mapping of audio features to create a motion code
- Combination of canonical and motion codes for image generation
Potential Applications
This technology can be used in video conferencing, virtual reality, entertainment industry, and online communication platforms.
Problems Solved
This technology addresses the need for realistic and controllable talking face images for various applications.
Benefits
- Enhanced communication experiences
- Realistic and controllable face image generation
- Improved user engagement in virtual environments
Commercial Applications
- Virtual reality applications
- Video conferencing software
- Entertainment industry for special effects
Prior Art
Prior research in facial image generation using deep learning techniques can be relevant to this technology.
Frequently Updated Research
Research on improving the realism and controllability of generated face images using advanced neural network architectures is ongoing.
Questions about Face Image Generation
How does this technology improve virtual communication experiences?
This technology enhances virtual communication by providing realistic and controllable talking face images, improving user engagement and interaction.
What are the potential applications of this face image generation method?
The potential applications include video conferencing, virtual reality, entertainment industry, and online communication platforms.
Original Abstract Submitted
provided are face image generation method and device for generating a controllable talking face image. the method includes: a face image generation method for generating a controllable talking face image, the method comprising: receiving a source image and a series of driving images, sampled from the same video, and input audio; acquiring a style latent code including a source latent code and a driving latent code by encoding the source image and the series of driving images into a visual space by a visual encoder; acquiring an audio feature including an audio latent code by encoding the input audio by an audio encoder; acquiring a canonical code by mapping the source latent code to a canonical space by a canonical encoder; acquiring a motion code by combining the driving latent code with the audio latent code, and mapping the combined code to a multimodal motion space by a multimodal motion encoder; acquiring a multimodal fused latent code by combining the canonical code with the motion code; and generating a talking face image by transferring the multimodal fused latent code to a generative adversarial network (gan).