Lemon Inc. (20240274120). SPEECH SYNTHESIS METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM simplified abstract

From WikiPatents
Jump to navigation Jump to search

SPEECH SYNTHESIS METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM

Organization Name

Lemon Inc.

Inventor(s)

Dongyang Dai of Beijing (CN)

Yuanzhe Chen of Beijing (CN)

Li Chen of Beijing (CN)

Yuping Wang of Beijing (CN)

Qiao Tian of Beijing (CN)

Ming Tu of Los Angeles CA (US)

Rui Xia of Los Angeles CA (US)

Yuxuan Wang of Los Angeles CA (US)

SPEECH SYNTHESIS METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240274120 titled 'SPEECH SYNTHESIS METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM

Simplified Explanation: The patent application describes a method and apparatus for converting text into audio with a specific timbre using a pre-trained voice synthesis model.

Key Features and Innovation:

  • Utilizes a voice synthesis model with two feature extraction sub-models to generate target audio with a desired timbre.
  • First sub-model extracts acoustic features from input text, while the second sub-model generates mel spectrum features.
  • The target audio is obtained based on the mel spectrum features, resulting in the desired timbre for the output.

Potential Applications: This technology can be used in:

  • Text-to-speech applications for generating audio with specific timbres.
  • Voice assistants and chatbots to provide more natural and customizable voices.
  • Audio production for creating unique sound effects and music.

Problems Solved:

  • Enables the customization of audio timbre in text-to-speech systems.
  • Improves the naturalness and expressiveness of synthesized voices.
  • Streamlines the process of generating audio with specific timbres.

Benefits:

  • Enhances user experience in text-to-speech applications.
  • Allows for more personalized and engaging voice interactions.
  • Increases the versatility of audio synthesis for various purposes.

Commercial Applications: Title: "Voice Synthesis Technology for Customized Audio Timbres" This technology can be applied in industries such as:

  • Entertainment for creating unique character voices in video games and animations.
  • Education for developing interactive learning tools with engaging audio content.
  • Customer service for designing virtual assistants with distinct personalities.

Prior Art: Readers can explore prior research on voice synthesis models, text-to-speech technologies, and audio timbre customization in the field of artificial intelligence and machine learning.

Frequently Updated Research: Stay updated on advancements in voice synthesis models, audio processing techniques, and natural language processing algorithms relevant to this technology.

Questions about Voice Synthesis Technology for Customized Audio Timbres: 1. How does this technology improve the user experience in text-to-speech applications? 2. What are the potential challenges in implementing customized audio timbres in voice synthesis systems?


Original Abstract Submitted

provided are an audio synthesis method and apparatus, an electronic device, and a readable storage medium. in the present solution, conversion from a text to an audio having a target timbre is achieved by means of a pre-trained voice synthesis model, the voice synthesis model comprising a first feature extraction sub-model and a second feature extraction sub-model, wherein the first feature extraction sub-model outputs, according to an inputted text to be processed, an acoustic feature comprising a bottleneck feature; the second feature extraction sub-model outputs, according to the inputted first acoustic features, a mel spectrum feature corresponding to the text to be processed; according to the mel spectrum feature corresponding to the text to be processed, the target audio corresponding to the text to be processed is obtained, and the target audio has the target timbre.