18299248. MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS

Organization Name

SAMSUNG ELECTRONICS CO., LTD.

Inventor(s)

Liang Zhao of Saratoga CA (US)

Siva Penke of San Jose CA (US)

MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18299248 titled 'MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS

Simplified Explanation

The method involves using a processing device to analyze audio input from a speaker, extracting audio features, and generating content and style predictions based on disentangled embeddings.

  • Obtaining audio input from a speaker
  • Extracting audio features using a feature extractor
  • Generating content and style predictions based on disentangled embeddings

Key Features and Innovation

  • Utilizes a trained machine learning model to extract content and style embeddings from audio features
  • Predicts content and style parameters separately for enhanced accuracy
  • Disentangles content and style information for better prediction results

Potential Applications

  • Personalized audio content generation
  • Speech recognition and analysis
  • Music composition and remixing

Problems Solved

  • Improved accuracy in predicting content and style parameters
  • Enhanced customization of audio content
  • Efficient processing of audio data

Benefits

  • Enhanced user experience with personalized audio content
  • Increased accuracy in speech recognition applications
  • Streamlined music composition processes

Commercial Applications

Commercializing Personalized Audio Content Generation Technology This technology can be applied in various industries such as entertainment, telecommunications, and education to create customized audio content for users, leading to improved engagement and user satisfaction.

Prior Art

Research in the field of audio content generation and style prediction using machine learning models can provide insights into similar technologies and approaches.

Frequently Updated Research

Stay updated on advancements in machine learning models for audio analysis and content generation to leverage the latest techniques and algorithms for improved results.

Questions about Personalized Audio Content Generation

How does disentangling content and style embeddings improve prediction accuracy?

Disentangling content and style embeddings allows for separate prediction of content and style parameters, leading to more accurate and customized results.

What are the potential challenges in implementing this technology in real-world applications?

Implementing this technology may face challenges related to data privacy, model training, and integration with existing systems.


Original Abstract Submitted

A method includes obtaining, using at least one processing device of an electronic device, an audio input associated with a speaker. The method also includes extracting, using a feature extractor of a trained machine learning model, audio features from the audio input. The method further includes generating (i) one or more content parameter predictions using content embeddings extracted by a content encoder and decoded by a content decoder of the trained machine learning model and (ii) one or more style parameter predictions using style embeddings extracted by a style encoder and decoded by a style decoder of the trained machine learning model. The content embeddings and the style embeddings are based on the audio features of the audio input. The trained machine learning model is trained to generate the one or more content parameter predictions and the one or more style parameter predictions using disentangled content and style embeddings.