Google llc (20240127791). END-TO-END TEXT-TO-SPEECH CONVERSION simplified abstract

From WikiPatents
Revision as of 04:02, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

END-TO-END TEXT-TO-SPEECH CONVERSION

Organization Name

google llc

Inventor(s)

Samuel Bengio of Los Altos CA (US)

Yuxuan Wang of Sunnyvale CA (US)

Zongheng Yang of Berkeley CA (US)

Zhifeng Chen of Sunnyvale CA (US)

Yonghui Wu of Fremont CA (US)

Ioannis Agiomyrgiannakis of London (GB)

Ron J. Weiss of New York NY (US)

Navdeep Jaitly of Mountain View CA (US)

Ryan M. Rifkin of Oakland CA (US)

Robert Andrew James Clark of Hertfordshire (GB)

Quoc V. Le of Sunnyvale CA (US)

Russell J. Ryan of Mountain View CA (US)

Ying Xiao of San Bruno CA (US)

END-TO-END TEXT-TO-SPEECH CONVERSION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240127791 titled 'END-TO-END TEXT-TO-SPEECH CONVERSION

Simplified Explanation

The patent application describes a system for generating speech from text using a sequence-to-sequence recurrent neural network.

  • The system includes one or more computers and storage devices storing instructions for processing a sequence of characters in a natural language to generate a spectrogram of a verbal utterance.
  • The system also includes a subsystem that provides the sequence of characters as input to the neural network to obtain the spectrogram of the verbal utterance.

Potential Applications

This technology could be used in speech synthesis applications, language translation services, and voice-controlled devices.

Problems Solved

This technology solves the problem of converting text into spoken language accurately and efficiently, enabling better communication and accessibility for users.

Benefits

The benefits of this technology include improved speech synthesis quality, faster processing of text-to-speech conversion, and enhanced user experience in various applications.

Potential Commercial Applications

The technology could be applied in virtual assistants, automated customer service systems, language learning tools, and accessibility devices, with the potential for commercialization in the tech industry.

Possible Prior Art

Prior art in this field includes existing speech synthesis systems, neural network-based language processing technologies, and text-to-speech conversion software.

Unanswered Questions

How does this technology compare to existing speech synthesis systems in terms of accuracy and efficiency?

This article does not provide a direct comparison with existing speech synthesis systems to evaluate the performance of the proposed technology.

What are the potential limitations or challenges in implementing this technology on a large scale?

The article does not address the scalability or practical challenges that may arise when deploying this technology in real-world applications.


Original Abstract Submitted

methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. one of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.