Microsoft Technology Licensing, LLC (20240265924). CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH simplified abstract

From WikiPatents
Jump to navigation Jump to search

CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Jinyu Li of Bellevue WA (US)

Long Zhou of Beijing (CN)

Xie Sun of Bellevue WA (US)

Shujie Liu of Beijing (CN)

CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240265924 titled 'CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH

The patent application describes a system for building a configurable multilingual model for automatic speech recognition.

  • The system obtains language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language dataset.
  • It compiles these modules to create a configurable multilingual model that can dynamically utilize specific language modules based on user input.
  • The model processes audio content in response to user input identifying target languages associated with the content.

Potential Applications: - Multilingual voice assistants - Language translation services - Multilingual customer service support

Problems Solved: - Streamlining multilingual speech recognition systems - Improving accuracy and efficiency in processing audio content in multiple languages

Benefits: - Enhanced user experience for multilingual users - Increased accuracy and speed in speech recognition - Cost-effective solution for businesses operating in multiple language markets

Commercial Applications: "Configurable Multilingual Model for Enhanced Automatic Speech Recognition in Multilingual Environments"

Questions about the technology: 1. How does the system determine which language-specific modules to utilize for processing audio content? 2. What are the potential limitations of the configurable multilingual model in terms of language coverage and accuracy?


Original Abstract Submitted

embodiments are provided for building a configurable multilingual model. a computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. the computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.