20240013802. INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING simplified abstract (NVIDIA Corporation)

From WikiPatents
Jump to navigation Jump to search

INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING

Organization Name

NVIDIA Corporation

Inventor(s)

Ilia Federov of Moscow (RU)

Dmitry Aleksandrovich Korobchenko of Moscow (RU)

INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240013802 titled 'INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING

Simplified Explanation

The abstract describes a patent application for a deep neural network that can infer emotion data from input audio. The network is transformer-based and can provide probability values for different emotions or emotion classes. These probability values can be modified using heuristics or a user interface to ensure accurate and smooth emotion determinations. Users can also blend prior emotion values with the determined values. The determined emotion values can be used for emotion-based operations, such as audio-driven speech animation.

  • The patent application describes a deep neural network that can infer emotion data from input audio.
  • The network is transformer-based and can provide probability values for a set of emotions or emotion classes.
  • The emotion probability values can be modified using heuristics or a user interface.
  • Users can modify emotion determinations as needed and blend prior emotion values with the determined values.
  • The determined emotion values can be used for emotion-based operations, such as audio-driven speech animation.

Potential Applications

  • Emotion recognition in voice assistants or chatbots to enhance user experience.
  • Emotion-based content recommendation systems for personalized entertainment.
  • Emotion analysis in market research to understand consumer preferences and sentiments.
  • Emotion-driven virtual reality experiences for immersive storytelling or gaming.

Problems Solved

  • Accurate and automated inference of emotion data from audio inputs.
  • Smoothing of emotion determinations over time for more consistent results.
  • User customization and modification of emotion determinations.
  • Integration of prior emotion values for blending and improved accuracy.

Benefits

  • Enhanced user experience through emotion-aware systems.
  • Improved personalization and customization in various applications.
  • Better understanding of user preferences and sentiments.
  • Creation of more engaging and immersive content and experiences.


Original Abstract Submitted

a deep neural network can be trained to infer emotion data from input audio. the network can be a transformer-based network that can infer probability values for a set of emotions or emotion classes. the emotion probability values can be modified using one or more heuristics, such as to provide for smoothing of emotion determinations over time, or via a user interface, where a user can modify emotion determinations as appropriate. a user may also provide prior emotion values to be blended with these emotion determination values. determined emotion values can be provided as input to an emotion-based operation, such as to provide audio-driven speech animation.