17991443. METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR TEXT TO SPEECH simplified abstract (Dell Products L.P.)

From WikiPatents
Jump to navigation Jump to search

METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR TEXT TO SPEECH

Organization Name

Dell Products L.P.

Inventor(s)

Wenbin Yang of Shanghai (CN)

Zijia Wang of WeiFang (CN)

Jiacheng Ni of Shanghai (CN)

Zhen Jia of Shanghai (CN)

METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR TEXT TO SPEECH - A simplified explanation of the abstract

This abstract first appeared for US patent application 17991443 titled 'METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR TEXT TO SPEECH

The present disclosure pertains to a method, device, and computer program product for text-to-speech synthesis. The method involves encoding the style of a reference waveform from one speaker and transferring this style to a spectrogram generated from input text, resulting in a style-transferred spectrogram that is then converted into a speech waveform.

  • Comparative learning framework for flexible and effective speech synthesis with a target speaker's style
  • Lightweight speech style transfer capability
  • High-quality and recognizable speech synthesis features
  • Effective speaker feature learning

Potential Applications: - Speech synthesis for various applications such as virtual assistants, audiobooks, and voice assistants - Personalized speech synthesis for individuals with unique speech styles

Problems Solved: - Efficiently synthesizing speech with different styles - Enabling personalized speech synthesis for various applications

Benefits: - Improved speech synthesis quality - Enhanced speaker feature learning - Personalized and customizable speech synthesis capabilities

Commercial Applications: Title: Advanced Text-to-Speech Synthesis Technology for Personalized Applications This technology can be utilized in industries such as entertainment, education, customer service, and accessibility services to provide personalized and high-quality speech synthesis solutions.

Prior Art: There have been advancements in speech synthesis technology, but the ability to transfer speech styles efficiently and effectively remains a challenge.

Frequently Updated Research: Researchers are continuously exploring new methods and techniques to enhance speech synthesis technology, including style transfer capabilities and speaker feature learning.

Questions about Text-to-Speech Synthesis Technology: 1. How does this technology improve the efficiency of speech synthesis with different styles? 2. What are the potential applications of personalized speech synthesis in various industries?


Original Abstract Submitted

Embodiments of the present disclosure relate to a method, a device, and a computer program product for text to speech. The method includes encoding a reference waveform of a first speaker to obtain an encoded style feature separated from a second speaker. The method further includes transferring the encoded style feature to a spectrogram obtained by encoding an input text, to obtain a style transferred spectrogram. The method further includes converting the style transferred spectrogram into a time-domain speech waveform. According to the method for text to speech in the present disclosure, a comparative learning framework can also flexibly and effectively synthesize speech with a style of a target speaker, thus realizing lightweight speech style transfer, making it possible to learn high-quality and recognizable features of speech synthesis, and realizing effective speaker feature learning. In addition, the model will be beneficial to other downstream tasks.