18296202. MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS simplified abstract (Samsung Electronics Co., Ltd.)

From WikiPatents
Jump to navigation Jump to search

MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS

Organization Name

Samsung Electronics Co., Ltd.

Inventor(s)

Siddarth Ravichandran of Santa Clara CA (US)

Dimitar Petkov Dinev of Sunnyvale CA (US)

Ondrej Texler of San Jose CA (US)

Ankur Gupta of San Jose CA (US)

Janvi Chetan Palan of Santa Clara CA (US)

Hyun Jae Kang of Mountain View CA (US)

Anthony Sylvain Jean-Yves Liot of San Jose CA (US)

Sajid Sadi of San Jose CA (US)

MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18296202 titled 'MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS

Simplified Explanation

The abstract of this patent application describes a method for multimodal disentanglement, specifically generating silhouette images of a human face and using them to train a machine learning model. The trained model can then generate synthetic images of the human face, which are used to train a multimodal rendering network. The rendering network is trained to generate a voice-animated digital human by minimizing differences between the synthetic images and the images it generates.

  • The patent application proposes a method for generating silhouette images of a human face and using them to train a machine learning model.
  • The trained machine learning model can generate synthetic images of the human face.
  • The synthetic images are then used to train a multimodal rendering network.
  • The rendering network is trained to generate a voice-animated digital human.
  • The training of the rendering network involves minimizing differences between the synthetic images and the images it generates.

Potential applications of this technology:

  • Creation of voice-animated digital humans for virtual assistants, video games, or virtual reality experiences.
  • Facial animation in movies or animated films.
  • Virtual avatars for online communication platforms or social media.

Problems solved by this technology:

  • Generating realistic and synchronized voice-animated digital humans can be challenging.
  • Disentangling different facial features and expressions from a single image can be difficult.
  • Training a rendering network without a large dataset of real images can be problematic.

Benefits of this technology:

  • Enables the generation of realistic voice-animated digital humans.
  • Provides a method for disentangling facial features and expressions.
  • Reduces the need for a large dataset of real images for training a rendering network.


Original Abstract Submitted

Multimodal disentanglement can include generating a set of silhouette images corresponding to a human face, the generating undoing a correlation between an upper portion and a lower portion of the human face depicted by each silhouette image. A unimodal machine learning model can be trained with the set of silhouette images. As trained, the unimodal machine learning model can generate synthetic images of the human face. The synthetic images generated by the unimodal machine learning model once trained can be used to train a multimodal rendering network. The multimodal rendering network can be trained to generate a voice-animated digital human. Training the multimodal rendering network can be based on minimizing differences between the synthetic images and images generated by the multimodal rendering network.