20240013464. MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS simplified abstract (Samsung Electronics Co., Ltd.)

From WikiPatents
Jump to navigation Jump to search

MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS

Organization Name

Samsung Electronics Co., Ltd.

Inventor(s)

Siddarth Ravichandran of Santa Clara CA (US)

Dimitar Petkov Dinev of Sunnyvale CA (US)

Ondrej Texler of San Jose CA (US)

Ankur Gupta of San Jose CA (US)

Janvi Chetan Palan of Santa Clara CA (US)

Hyun Jae Kang of Mountain View CA (US)

Anthony Sylvain Jean-Yves Liot of San Jose CA (US)

Sajid Sadi of San Jose CA (US)

MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240013464 titled 'MULTIMODAL DISENTANGLEMENT FOR GENERATING VIRTUAL HUMAN AVATARS

Simplified Explanation

The abstract of the patent application describes a method for multimodal disentanglement, specifically in generating a set of silhouette images of a human face and using them to train a unimodal machine learning model. The trained model can then generate synthetic images of the human face, which are used to train a multimodal rendering network. The rendering network is trained to generate a voice-animated digital human by minimizing the differences between the synthetic images and the images generated by the network.

  • The patent application proposes a method for multimodal disentanglement.
  • The method involves generating silhouette images of a human face.
  • The correlation between the upper and lower portions of the face in each silhouette image is undone.
  • A unimodal machine learning model is trained using the set of silhouette images.
  • The trained model can generate synthetic images of the human face.
  • The synthetic images are then used to train a multimodal rendering network.
  • The rendering network is trained to generate a voice-animated digital human.
  • The training of the rendering network involves minimizing the differences between the synthetic images and the network's generated images.

Potential Applications:

  • Creation of voice-animated digital humans for virtual assistants, video games, or virtual reality applications.
  • Facial animation in movies, animations, or computer-generated imagery (CGI).

Problems Solved:

  • Multimodal disentanglement allows for the separation and manipulation of different aspects of a human face, such as the upper and lower portions.
  • Generating synthetic images helps in training a rendering network to create realistic and expressive voice-animated digital humans.

Benefits:

  • Enables the creation of highly realistic and customizable voice-animated digital humans.
  • Provides a method for generating synthetic images that can be used for training rendering networks.
  • Allows for the disentanglement and manipulation of different facial features, enhancing the realism and expressiveness of digital humans.


Original Abstract Submitted

multimodal disentanglement can include generating a set of silhouette images corresponding to a human face, the generating undoing a correlation between an upper portion and a lower portion of the human face depicted by each silhouette image. a unimodal machine learning model can be trained with the set of silhouette images. as trained, the unimodal machine learning model can generate synthetic images of the human face. the synthetic images generated by the unimodal machine learning model once trained can be used to train a multimodal rendering network. the multimodal rendering network can be trained to generate a voice-animated digital human. training the multimodal rendering network can be based on minimizing differences between the synthetic images and images generated by the multimodal rendering network.