SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS

Organization Name

Inventor(s)

SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240339104 titled 'SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS

The patent application describes a system and method for text to speech synthesis, where input text, a reference spectrogram, and either an emotion id or speaker id are received.

The system generates vector representations of the input text and reference spectrogram using encoders.
A modified vector representation is created by a variance adaptor, combining the input text, reference spectrogram, and emotion/speaker id embeddings.
An audio waveform is generated by a decoder based on the modified vector representation.
The audio waveform can be played through a speaker.

- Key Features and Innovation:**

- Utilizes encoders to generate vector representations of input text and reference spectrogram. - Incorporates emotion and speaker id embeddings to modify the vector representation. - Generates audio waveform based on the modified vector representation.

- Potential Applications:**

- Text to speech applications for various industries. - Personalized speech synthesis based on emotions or speakers. - Assistive technologies for individuals with speech impairments.

- Problems Solved:**

- Enhances the quality and customization of text to speech synthesis. - Enables more natural and expressive speech generation. - Facilitates the integration of emotions and speaker characteristics into synthesized speech.

- Benefits:**

- Improved user experience with more natural sounding speech. - Enhanced personalization options for speech synthesis. - Increased flexibility in generating expressive speech.

- Commercial Applications:**

Speech synthesis for customer service chatbots, personalized voice assistants, audiobook narration, and language learning applications.

- Questions about Text to Speech Synthesis:**

1. How does the system incorporate emotion and speaker id embeddings into the speech synthesis process? 2. What are the potential implications of this technology for the entertainment industry?

Original Abstract Submitted

embodiments described herein provide systems and methods for text to speech synthesis. a system receives, via a data interface, an input text, a reference spectrogram, and at least one of an emotion id or speaker id. the system generates, via a first encoder, a vector representation of the input text. the system generates, via a second encoder, a vector representation of the reference spectrogram. the system generates, via a variance adaptor, a modified vector representation based on a combined representation including a combination of the vector representation of the input text, the vector representation of the reference spectrogram, and at least one of an embedding of the emotion id or an embedding of the speaker id. the system generates, via a decoder, an audio waveform based on the modified vector representation. the generated audio waveform may be played via a speaker.

Datum Point Labs Inc. (20240339104). SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS simplified abstract

Contents

SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS

Organization Name

Inventor(s)

SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools