Datum Point Labs Inc. (20240339103). SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS simplified abstract

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS

Organization Name

Datum Point Labs Inc.

Inventor(s)

Jeongki Min of Seoul (KR)

Bonhwa Ku of Seoul (KR)

Hanseok Ko of Seoul (KR)

SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240339103 titled 'SYSTEMS AND METHODS FOR TEXT-TO-SPEECH SYNTHESIS

Simplified Explanation: This patent application describes a system and method for text to speech synthesis using reference spectrograms and style vectors to generate high-quality audio waveforms.

Key Features and Innovation:

  • System receives input text, first and second reference spectrograms.
  • Generates vector representations of inputs and a combined representation.
  • Performs cross attention to generate a style vector.
  • Uses style vector to condition speech generation via conditional layer normalization.
  • Outputs audio waveform for playback via a speaker.

Potential Applications: This technology can be used in digital avatar interfaces, communication applications, voice assistants, and accessibility tools for the visually impaired.

Problems Solved: This technology addresses the need for natural and expressive text to speech synthesis, improving the quality and customization of generated audio.

Benefits:

  • Enhanced naturalness and expressiveness in synthesized speech.
  • Customizable styles and tones for different applications.
  • Improved accessibility for visually impaired individuals.
  • Seamless integration into various communication platforms.

Commercial Applications: The technology can be applied in voice-enabled devices, virtual reality applications, customer service chatbots, and educational tools for language learning.

Prior Art: For prior art related to this technology, researchers can explore patents and publications in the fields of speech synthesis, natural language processing, and machine learning.

Frequently Updated Research: Researchers in the field of speech synthesis and artificial intelligence continue to explore advancements in neural network architectures, style transfer techniques, and audio signal processing for text to speech applications.

Questions about Text to Speech Synthesis: 1. How does this technology improve the customization of speech synthesis? 2. What are the potential limitations of using style vectors in conditioning speech generation?


Original Abstract Submitted

embodiments described herein provide systems and methods for text to speech synthesis. a system receives, via a data interface, an input text, a first reference spectrogram, and a second reference spectrogram. the system generates, via encoders, vector representations of each of the inputs. the system generates a combined representation based on the vector representation of the first reference spectrogram and the vector representation of the second reference spectrogram. the system performs cross attention between the combined representation and the vector representation of the input text to generate a style vector. the system may generate, via a decoder, an audio waveform based on the modified vector representation and conditioned by the style vector where the style vector conditions the speech generation via conditional layer normalization. the generated audio waveform may be played via a speaker. the generated audio may be used in communication by a digital avatar interface.