18654278. USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS

Organization Name

Google LLC

Inventor(s)

Zhehuai Chen of Edgewater NJ (US)

Bhuvana Ramabhadran of Mt. Kisco NY (US)

Andrew Rosenberg of Brooklyn NY (US)

Yu Zhang of Mountain View CA (US)

Pedro J. Moreno Mengibar of Jersey City NJ (US)

USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18654278 titled 'USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS

Simplified Explanation

The patent application describes a method for training a speech recognition model using multilingual text-to-speech models. This method involves generating speech representations in different languages conditioned on speaker characteristics, and updating the model based on consistent loss terms.

Key Features and Innovation

  • Obtaining a multilingual text-to-speech model
  • Generating native and cross-lingual synthesized speech representations
  • Conditioning on speaker characteristics of native speakers in different languages
  • Updating the speech recognition model based on consistent loss terms

Potential Applications

This technology can be used in various applications such as language translation, voice assistants, and speech-to-text systems.

Problems Solved

This technology addresses the challenges of training speech recognition models in multilingual environments and improving accuracy across different languages.

Benefits

  • Enhanced speech recognition accuracy in multilingual settings
  • Improved performance in recognizing different accents and dialects
  • Increased efficiency in training speech recognition models

Commercial Applications

  • Multilingual voice assistants for global markets
  • Language translation services
  • Speech-to-text applications for diverse language users

Prior Art

Researchers can explore prior art related to multilingual speech recognition models, text-to-speech technology, and speaker adaptation techniques.

Frequently Updated Research

Stay informed about the latest advancements in multilingual speech recognition, text-to-speech models, and speaker adaptation methods.

Questions about Multilingual Speech Recognition

How does conditioning on speaker characteristics improve speech recognition accuracy?

Conditioning on speaker characteristics helps the model adapt to different accents, intonations, and speech patterns, leading to more accurate recognition.

What are the potential challenges of implementing cross-lingual synthesized speech representations?

One challenge could be ensuring the accuracy and naturalness of speech across different languages while maintaining consistent speaker characteristics.


Original Abstract Submitted

A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.