20240013462. AUDIO-DRIVEN FACIAL ANIMATION WITH EMOTION SUPPORT USING MACHINE LEARNING simplified abstract (NVIDIA Corporation)

From WikiPatents
Jump to navigation Jump to search

AUDIO-DRIVEN FACIAL ANIMATION WITH EMOTION SUPPORT USING MACHINE LEARNING

Organization Name

NVIDIA Corporation

Inventor(s)

Yeongho Seol of Seoul (KR)

Simon Yuen of Playa Vista CA (US)

Dmitry Aleksandrovich Korobchenko of Moscow (RU)

Mingquan Zhou of Millbrae CA (US)

Ronan Browne of Fairfax CA (US)

Wonmin Byeon of Santa Cruz CA (US)

AUDIO-DRIVEN FACIAL ANIMATION WITH EMOTION SUPPORT USING MACHINE LEARNING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240013462 titled 'AUDIO-DRIVEN FACIAL ANIMATION WITH EMOTION SUPPORT USING MACHINE LEARNING

Simplified Explanation

The abstract describes a patent application for a deep neural network that can generate motion or deformation information for a character based on audio input, accurately representing the character's emotional state. The network can model different facial components separately and generate motion information for each component. During training, the network can be provided with emotion and style vectors to generate realistic animation for input speech, considering the character's emotions, relative weighting of those emotions, and any adjustments to how the character expresses the emotional state. The network output can be used by a renderer to generate emotion-accurate audio-driven facial animation.

  • The patent application describes a deep neural network that generates motion or deformation information for a character based on audio input.
  • The network can model different facial components separately, allowing for motion information to be generated for each component.
  • During training, the network is provided with emotion and style vectors to generate realistic animation for input speech.
  • The emotion and style vectors indicate the character's emotions, relative weighting of those emotions, and any adjustments to how the character expresses the emotional state.
  • The network output can be used by a renderer to generate audio-driven facial animation that accurately represents the character's emotional state.

Potential Applications:

  • Animation and gaming industry: This technology can be used to create more realistic and emotionally expressive characters in animated movies, video games, and virtual reality experiences.
  • Virtual assistants and chatbots: The technology can be applied to virtual assistants and chatbots to enhance their ability to convey emotions and engage with users on a more human-like level.
  • Therapy and mental health: The technology can be utilized in therapeutic applications, such as virtual reality therapy, to create virtual characters that can accurately express and respond to emotions, aiding in emotional regulation and empathy development.

Problems Solved:

  • Lack of emotional expressiveness in animated characters: This technology addresses the challenge of creating animated characters that can accurately convey emotions, enhancing the overall realism and engagement of animations.
  • Limited emotional range in virtual assistants and chatbots: By incorporating emotion-accurate facial animation, virtual assistants and chatbots can better understand and respond to users' emotions, improving the user experience.
  • Difficulty in creating emotionally engaging virtual experiences: The technology solves the problem of creating virtual experiences that evoke emotional responses from users, making them more immersive and impactful.

Benefits:

  • Enhanced realism and engagement: The technology allows for the creation of animated characters that can realistically express emotions, resulting in more engaging and immersive animations, games, and virtual experiences.
  • Improved user experience: Virtual assistants and chatbots equipped with emotion-accurate facial animation can better understand and respond to users' emotions, providing a more personalized and empathetic interaction.
  • Therapeutic applications: The technology can be utilized in therapeutic settings to create virtual characters that aid in emotional regulation, empathy development, and mental health treatment.


Original Abstract Submitted

a deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. the character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. during training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. the network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.