18468086. HYBRID LANGUAGE MODELS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS simplified abstract (NVIDIA Corporation)

From WikiPatents
Jump to navigation Jump to search

HYBRID LANGUAGE MODELS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Organization Name

NVIDIA Corporation

Inventor(s)

Vladimir Bataev of Yerevan (AM)

Roman Korostik of Yerevan (AM)

Evgenii Shabalin of Moskva (RU)

Vitaly Sergeyevich Lavrukhin of Campbell CA (US)

Boris Ginsburg of Sunnyvale CA (US)

HYBRID LANGUAGE MODELS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18468086 titled 'HYBRID LANGUAGE MODELS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

The abstract of this patent application describes a method where textual data is used to generate an intermediate speech representation, which is then processed by a second model to produce output data indicating a different set of textual data. The parameters of the second model are updated using the output data and ground truth data associated with the original textual data. Additionally, a generator from a generative adversarial network is used to enhance the intermediate audio representation before it is provided to the second model.

  • The method involves using two machine learning models, one for Text-To-Speech (TTS) and the other for Automatic Speech Recognition (ASR), to process and generate speech representations.
  • A generative adversarial network is utilized to improve the quality of the intermediate audio representation.
  • The parameters of the second model are updated based on the output data and ground truth data associated with the original textual data.

Potential Applications: - This technology can be applied in speech synthesis and recognition systems. - It can be used in language translation services to improve accuracy and naturalness of speech output.

Problems Solved: - Enhances the quality and accuracy of speech synthesis and recognition systems. - Improves the overall performance of language translation services.

Benefits: - Higher quality speech output. - Enhanced accuracy in speech recognition. - Improved user experience in language translation applications.

Commercial Applications: Title: Advanced Speech Processing Technology for Enhanced Language Translation Services This technology can be utilized in various commercial applications such as: - Language translation apps - Virtual assistants - Customer service chatbots

Questions about Advanced Speech Processing Technology: 1. How does this technology improve the accuracy of speech recognition systems? 2. What are the potential challenges in implementing this technology in real-world applications?

Frequently Updated Research: Stay updated on the latest advancements in speech processing technology to ensure optimal performance and accuracy in language translation services.


Original Abstract Submitted

In various examples, first textual data may be applied to a first MLM to generate an intermediate speech representation (e.g., a frequency-domain representation), the intermediate audio representation and a second MLM may be used to generate output data indicating second textual data, and parameters of the second MLM may be updated using the output data and ground truth data associated with the first textual data. The first MLM may include a trained Text-To-Speech (TTS) model and the second MLM may include an Automatic Speech Recognition (ASR) model. A generator from a generative adversarial networks may be used to enhance an initial intermediate audio representation generated using the first MLM and the enhanced intermediate audio representation may be provided to the second MLM. The generator may include generator blocks that receive the initial intermediate audio representation to sequentially generate the enhanced intermediate audio representation.