International business machines corporation (20240127801). DOMAIN ADAPTIVE SPEECH RECOGNITION USING ARTIFICIAL INTELLIGENCE simplified abstract

From WikiPatents
Revision as of 03:32, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DOMAIN ADAPTIVE SPEECH RECOGNITION USING ARTIFICIAL INTELLIGENCE

Organization Name

international business machines corporation

Inventor(s)

Tohru Nagano of Tokyo (JP)

Gakuto Kurata of Tokyo (JP)

DOMAIN ADAPTIVE SPEECH RECOGNITION USING ARTIFICIAL INTELLIGENCE - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240127801 titled 'DOMAIN ADAPTIVE SPEECH RECOGNITION USING ARTIFICIAL INTELLIGENCE

Simplified Explanation

The patent application describes methods, systems, and computer program products for domain adaptive speech recognition using artificial intelligence. The process involves generating language data candidates from phonemes, determining subsets of graphemes for target phonemes, generating speech recognition outputs using biasing language models and AI-based speech recognition models, and performing automated actions based on the final speech recognition output.

  • Generating language data candidates from phonemes using an AI-based data conversion model
  • Determining subsets of graphemes for target phonemes
  • Generating speech recognition outputs using biasing language models and AI-based speech recognition models
  • Performing automated actions based on the final speech recognition output

Potential Applications

This technology can be applied in various fields such as virtual assistants, customer service chatbots, transcription services, and language translation tools.

Problems Solved

This technology helps improve the accuracy and efficiency of speech recognition systems, especially in domain-specific contexts where traditional models may struggle to accurately transcribe speech.

Benefits

The benefits of this technology include enhanced speech recognition performance, increased adaptability to different domains, and improved user experience in speech-to-text applications.

Potential Commercial Applications

Potential commercial applications of this technology include speech-to-text transcription services, virtual assistant platforms, customer service automation tools, and language translation services.

Possible Prior Art

One possible prior art in this field is the use of neural network models for speech recognition, which have been widely studied and implemented in various applications.

Unanswered Questions

How does this technology compare to existing speech recognition systems in terms of accuracy and adaptability?

This article does not provide a direct comparison between this technology and existing speech recognition systems.

What are the potential limitations or challenges of implementing this technology in real-world applications?

The article does not address the potential limitations or challenges of implementing this technology in real-world applications.


Original Abstract Submitted

methods, systems, and computer program products for domain adaptive speech recognition using artificial intelligence are provided herein. a computer-implemented method includes generating a set of language data candidates, each language data candidate comprising one or more graphemes, by processing a sequence of phonemes related to input speech data using an artificial intelligence-based data conversion model; determining, for a target pair of phonemes and graphemes, a subset of graphemes from the set of language data candidates; generating a first speech recognition output by processing the subset of graphemes using at least one biasing language model and an artificial intelligence-based speech recognition model; generating a second speech recognition output by replacing at least a portion of the subset of graphemes in the first speech recognition output with at least one of the graphemes from the target pair; and performing automated actions based on the second speech recognition output.