17984590. SYNTHETIC SPEECH GENERATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS simplified abstract (NVIDIA Corporation)

From WikiPatents
Jump to navigation Jump to search

SYNTHETIC SPEECH GENERATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Organization Name

NVIDIA Corporation

Inventor(s)

Subhankar Ghosh of Santa Clara CA (US)

Boris Ginsburg of Sunnyvale CA (US)

SYNTHETIC SPEECH GENERATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17984590 titled 'SYNTHETIC SPEECH GENERATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Simplified Explanation

The patent application describes techniques for generating artificial speech using machine learning, including obtaining synthetic embeddings associated with different speakers and using a multi-stage training process to improve the quality of training speech utterances.

  • Obtaining synthetic embeddings using learned embeddings associated with different speakers
  • Generating at least one learned embedding through multi-stage training of a machine learning model with progressively increasing quality of training speech utterances
  • Using the machine learning model and synthetic embeddings to generate synthetic audio data

Potential Applications

This technology could be applied in various fields such as:

  • Speech synthesis for virtual assistants
  • Dubbing and voice-over services for movies and TV shows
  • Accessibility tools for individuals with speech impairments

Problems Solved

The technology addresses the following issues:

  • Improving the quality and naturalness of artificial speech
  • Enhancing the diversity of voices available for speech synthesis
  • Streamlining the process of generating synthetic audio data

Benefits

The benefits of this technology include:

  • Providing more realistic and human-like artificial speech
  • Enabling customization of synthetic voices for different applications
  • Enhancing user experience in interacting with speech-enabled devices

Potential Commercial Applications

A potential commercial application of this technology could be:

  • Developing advanced speech synthesis software for businesses in the entertainment industry

Possible Prior Art

One possible prior art in this field is the use of deep learning techniques for speech synthesis, which have been explored in various research studies and commercial applications.

Unanswered Questions

How does this technology compare to existing speech synthesis methods?

This article does not provide a direct comparison with traditional speech synthesis techniques or other machine learning-based approaches in terms of performance, efficiency, or cost-effectiveness.

What are the limitations of this technology in terms of scalability and real-time applications?

The article does not address the potential challenges or constraints of implementing this technology on a large scale or in time-sensitive scenarios where real-time speech synthesis is required.


Original Abstract Submitted

Disclosed are apparatuses, systems, and techniques that may use machine learning for generating artificial speech. The techniques include obtaining a synthetic embedding using learned embeddings associated with different speakers. At least one learned embedding may be generated using a multi-stage training of a machine learning model (MLM) with progressively increasing quality of training speech utterances. The techniques may further include using the MLM and the synthetic embedding to generate synthetic audio data.