18746809. Phonemes And Graphemes for Neural Text-to-Speech simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

Phonemes And Graphemes for Neural Text-to-Speech

Organization Name

Google LLC

Inventor(s)

Ye Jia of Mountain View CA (US)

Byungha Chun of Tokyo (JP)

Yu Zhang of Mountain View CA (US)

Jonathan Shen of Mountain View CA (US)

Yonghui Wu of Fremont CA (US)

Phonemes And Graphemes for Neural Text-to-Speech - A simplified explanation of the abstract

This abstract first appeared for US patent application 18746809 titled 'Phonemes And Graphemes for Neural Text-to-Speech

The method described in the patent application involves processing a text input that includes a sequence of words represented as an input encoder embedding. This embedding consists of two sets of tokens: grapheme tokens representing the text input as individual graphemes, and phoneme tokens representing the text input as individual phonemes.

  • The method identifies each phoneme token and determines the corresponding word in the sequence of words, as well as the grapheme token representing that word.
  • Based on the relationship between each phoneme token and its corresponding grapheme token, an output encoder embedding is generated. This output encoder embedding reflects the relationship between phonemes and graphemes within the text input.

Potential Applications: - Natural language processing systems - Speech recognition technology - Language translation tools

Problems Solved: - Improving accuracy in text-to-speech systems - Enhancing phonetic transcription processes - Streamlining language processing tasks

Benefits: - Increased efficiency in language processing algorithms - Enhanced accuracy in phonetic analysis - Improved performance of speech recognition systems

Commercial Applications: Title: Enhanced Language Processing Technology for Improved Speech Recognition This technology can be utilized in developing advanced speech recognition software for various industries, including telecommunications, customer service, and healthcare. It can also be integrated into language translation applications to enhance accuracy and speed.

Prior Art: Researchers in the field of natural language processing have explored similar methods for aligning phonemes and graphemes in text inputs. Studies on phonetic transcription and speech recognition have also contributed to the development of this technology.

Frequently Updated Research: Ongoing research in the field of phonetics and language processing continues to refine algorithms for aligning phonemes and graphemes in text inputs. Stay updated on the latest advancements in speech recognition technology to leverage the benefits of this innovative approach.

Questions about Language Processing Technology: 1. How does this technology improve the accuracy of speech recognition systems? This technology enhances accuracy by establishing a direct relationship between phonemes and graphemes in text inputs, improving the alignment process for speech recognition algorithms.

2. What potential applications can benefit from the integration of this language processing technology? Various industries, such as telecommunications, customer service, and healthcare, can benefit from the enhanced performance of speech recognition systems powered by this technology.


Original Abstract Submitted

A method includes receiving a text input including a sequence of words represented as an input encoder embedding. The input encoder embedding includes a plurality of tokens, with the plurality of tokens including a first set of grapheme tokens representing the text input as respective graphemes and a second set of phoneme tokens representing the text input as respective phonemes. The method also includes, for each respective phoneme token of the second set of phoneme tokens: identifying a respective word of the sequence of words corresponding to the respective phoneme token and determining a respective grapheme token representing the respective word of the sequence of words corresponding to the respective phoneme token. The method also includes generating an output encoder embedding based on a relationship between each respective phoneme token and the corresponding grapheme token determined to represent a same respective word as the respective phoneme token.