Samsung electronics co., ltd. (20240203014). MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS simplified abstract

From WikiPatents
Jump to navigation Jump to search

MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS

Organization Name

samsung electronics co., ltd.

Inventor(s)

Liang Zhao of Saratoga CA (US)

Siva Penke of San Jose CA (US)

MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240203014 titled 'MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS

The method described in the abstract involves using an electronic device to analyze audio input from a speaker, extracting audio features, and generating content and style parameter predictions based on disentangled content and style embeddings.

  • The method involves obtaining audio input from a speaker and extracting audio features using a trained machine learning model.
  • Content parameter predictions are generated using content embeddings extracted by a content encoder and decoder, while style parameter predictions are generated using style embeddings extracted by a style encoder and decoder.
  • The content and style embeddings are based on the audio features of the input, and the machine learning model is trained to generate predictions using these embeddings.

Potential Applications: This technology could be used in speech recognition systems, voice-controlled devices, and audio analysis tools for various applications such as transcription, voice synthesis, and emotion detection.

Problems Solved: This technology addresses the challenge of accurately analyzing and interpreting audio input to generate meaningful content and style parameter predictions.

Benefits: The technology can improve the accuracy and efficiency of audio analysis tasks, enhance the performance of speech recognition systems, and enable more personalized voice-controlled interactions.

Commercial Applications: This technology could be valuable for companies developing voice-controlled devices, speech recognition software, and audio processing tools for industries such as telecommunications, entertainment, and healthcare.

Prior Art: Researchers and developers in the fields of machine learning, audio processing, and speech recognition may have explored similar methods for analyzing audio input and generating predictions based on content and style embeddings.

Frequently Updated Research: Researchers may be exploring advancements in machine learning models for audio analysis, improvements in feature extraction techniques, and applications of disentangled embeddings in various audio processing tasks.

Questions about the Technology: 1. How does this technology compare to traditional methods of audio analysis and prediction? 2. What are the potential limitations or challenges of using disentangled content and style embeddings in audio processing tasks?


Original Abstract Submitted

a method includes obtaining, using at least one processing device of an electronic device, an audio input associated with a speaker. the method also includes extracting, using a feature extractor of a trained machine learning model, audio features from the audio input. the method further includes generating (i) one or more content parameter predictions using content embeddings extracted by a content encoder and decoded by a content decoder of the trained machine learning model and (ii) one or more style parameter predictions using style embeddings extracted by a style encoder and decoded by a style decoder of the trained machine learning model. the content embeddings and the style embeddings are based on the audio features of the audio input. the trained machine learning model is trained to generate the one or more content parameter predictions and the one or more style parameter predictions using disentangled content and style embeddings.