TEXT-TO-SPEECH (TTS) PROCESSING

Organization Name

Amazon Technologies, Inc.

Inventor(s)

Jaime Lorenzo Trueba of Cambridge (GB)

Thomas Renaud Drugman of Carnieres (BE)

Viacheslav Klimkov of Gdansk (PL)

Srikanth Ronanki of Cambridge (GB)

Thomas Edward Merritt of Cambridge (GB)

Andrew Paul Breen of Norwich (GB)

Roberto Barra-chicote of Cambridge (GB)

TEXT-TO-SPEECH (TTS) PROCESSING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240013770 titled 'TEXT-TO-SPEECH (TTS) PROCESSING

Simplified Explanation

During text-to-speech processing, a speech model generates audio data that corresponds to input text data. A spectrogram estimator is used to estimate the frequency spectrogram of the speech, which is then used to condition the speech model. Different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, can be separately encoded into context vectors. These separate context vectors are used by the spectrogram estimator to create the frequency spectrogram.

The patent application describes a method for improving text-to-speech processing.
A speech model is used to generate audio data based on input text data.
A spectrogram estimator is employed to estimate the frequency spectrogram of the speech.
The frequency spectrogram data is used to condition the speech model.
Acoustic features corresponding to different segments of the input text data can be encoded into context vectors.
The spectrogram estimator uses these context vectors to create the frequency spectrogram.

Potential Applications:

Text-to-speech systems and applications
Voice assistants and virtual agents
Audiobook narration and production
Accessibility tools for visually impaired individuals

Problems Solved:

Enhances the quality and naturalness of synthesized speech
Improves the accuracy of speech generation based on input text data
Enables better representation of different segments of the input text data

Benefits:

More realistic and natural-sounding synthesized speech
Improved intelligibility and clarity of synthesized speech
Enhanced user experience in text-to-speech applications
Greater flexibility in encoding and representing different aspects of input text data

Original Abstract Submitted

during text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. a spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. a plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

20240013770. TEXT-TO-SPEECH (TTS) PROCESSING simplified abstract (Amazon Technologies, Inc.)

Contents

TEXT-TO-SPEECH (TTS) PROCESSING

Organization Name

Inventor(s)

TEXT-TO-SPEECH (TTS) PROCESSING - A simplified explanation of the abstract

Simplified Explanation

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Patent Application Monitoring