Google llc (20240282292). USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS simplified abstract

From WikiPatents
Jump to navigation Jump to search

USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS

Organization Name

google llc

Inventor(s)

Zhehuai Chen of Edgewater NJ (US)

Bhuvana Ramabhadran of Mt. Kisco NY (US)

Andrew Rosenberg of Brooklyn NY (US)

Yu Zhang of Mountain View CA (US)

Pedro J. Moreno Mengibar of Jersey City NJ (US)

USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240282292 titled 'USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS

    • Simplified Explanation:**

The patent application describes a method for training a speech recognition model using a multilingual text-to-speech model. This method involves generating speech representations in different languages conditioned on speaker characteristics, obtaining speech recognition results, and updating the model based on the results.

    • Key Features and Innovation:**

- Obtaining a multilingual text-to-speech model - Generating native and cross-lingual synthesized speech representations - Conditioning on speaker characteristics for different languages - Determining consistent loss term based on speech recognition results - Updating parameters of the speech recognition model

    • Potential Applications:**

- Improving speech recognition accuracy across different languages - Enhancing multilingual communication tools - Developing more accurate voice assistants and transcription services

    • Problems Solved:**

- Addressing challenges in training speech recognition models for multilingual applications - Improving accuracy by considering speaker characteristics in different languages

    • Benefits:**

- Enhanced performance of speech recognition models - Better understanding and processing of speech in various languages - Improved user experience in multilingual applications

    • Commercial Applications:**

Title: Multilingual Speech Recognition Model Training Method This technology could be used in developing advanced speech recognition systems for customer service, language translation services, and voice-controlled devices. The market implications include improved accuracy and efficiency in multilingual communication tools.

    • Prior Art:**

Research on multilingual speech recognition models and text-to-speech systems can provide insights into prior art related to this technology. Researchers in the field of natural language processing and machine learning may have published relevant studies.

    • Frequently Updated Research:**

Stay updated on advancements in multilingual speech recognition models, text-to-speech technology, and speaker adaptation techniques to enhance the performance of speech recognition systems.

    • Questions about Multilingual Speech Recognition Model Training:**

1. How does conditioning on speaker characteristics improve speech recognition accuracy in different languages? 2. What are the potential challenges in implementing a multilingual text-to-speech model for training speech recognition systems?


Original Abstract Submitted

a method for training a speech recognition model includes obtaining a multilingual text-to-speech (tts) model. the method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. the method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. the method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. the method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.