18562962. TEXT-BASED SPEECH GENERATION simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents
Jump to navigation Jump to search

TEXT-BASED SPEECH GENERATION

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Xu Tan of Redmond WA (US)

Tao Qin of Beijing (CN)

Sheng Zhao of Redmond WA (US)

Tie-Yan Liu of Beijing (CN)

TEXT-BASED SPEECH GENERATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18562962 titled 'TEXT-BASED SPEECH GENERATION

    • Simplified Explanation:**

This patent application proposes a solution for generating text-to-speech by incorporating additional phonemes related to spontaneous speech characteristics, using expert models to determine phoneme durations, and generating spontaneous-style speech with varying rhythms.

    • Key Features and Innovation:**
  • Generation of initial phoneme sequence from text
  • Insertion of additional phoneme related to spontaneous speech
  • Determination of phoneme durations using expert models
  • Generation of spontaneous-style speech with varying rhythms
    • Potential Applications:**

This technology can be used in various applications such as virtual assistants, speech synthesis software, language learning tools, and accessibility solutions for visually impaired individuals.

    • Problems Solved:**

This technology addresses the need for more natural and spontaneous-sounding text-to-speech systems that can mimic the rhythm and flow of human speech.

    • Benefits:**
  • Improved naturalness and spontaneity in text-to-speech output
  • Enhanced user experience in applications requiring speech synthesis
  • Increased accessibility for individuals with visual impairments
    • Commercial Applications:**

Potential commercial applications include integration into virtual assistants, customer service chatbots, educational software, and assistive technologies for individuals with disabilities. This technology could also be used in entertainment and gaming industries for creating more realistic voice interactions.

    • Questions about Text-to-Speech:**

1. How does this technology improve the naturalness of text-to-speech output?

  - By incorporating additional phonemes related to spontaneous speech and using expert models to determine phoneme durations, this technology enhances the rhythm and flow of the generated speech.
  

2. What are the potential commercial uses of this advanced text-to-speech technology?

  - This technology can be applied in virtual assistants, customer service chatbots, language learning tools, and accessibility solutions, offering more natural and engaging speech synthesis capabilities.


Original Abstract Submitted

According to implementations of the subject matter described herein, a solution is proposed for text to speech. In this solution, an initial phoneme sequence corresponding to text is generated, the initial phoneme sequence comprising feature representations of a plurality of phonemes. A first phoneme sequence is generated by inserting a feature representation of an additional phoneme into the initial phoneme sequence, the additional phoneme being related to a characteristic of spontaneous speech. The duration of a phoneme among the plurality of phonemes and the additional phoneme is determined by using an expert model corresponding to the phoneme, and a second phoneme sequence is generated based on the first phoneme sequence. Spontaneous-style speech corresponding to the text is determined based on the second phoneme sequence. In this way, spontaneous-style speech with more varying rhythms can be generated based on spontaneous-style additional phonemes and multiple expert models.