Microsoft Technology Licensing, LLC (20240265924). CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH simplified abstract
Contents
CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH
Organization Name
Microsoft Technology Licensing, LLC
Inventor(s)
CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240265924 titled 'CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH
The patent application describes a system for building a configurable multilingual model for automatic speech recognition.
- The system obtains language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language dataset.
- It compiles these modules to create a configurable multilingual model that can dynamically utilize specific language modules based on user input.
- The model processes audio content in response to user input identifying target languages associated with the content.
Potential Applications: - Multilingual voice assistants - Language translation services - Multilingual customer service support
Problems Solved: - Streamlining multilingual speech recognition systems - Improving accuracy and efficiency in processing audio content in multiple languages
Benefits: - Enhanced user experience for multilingual users - Increased accuracy and speed in speech recognition - Cost-effective solution for businesses operating in multiple language markets
Commercial Applications: "Configurable Multilingual Model for Enhanced Automatic Speech Recognition in Multilingual Environments"
Questions about the technology: 1. How does the system determine which language-specific modules to utilize for processing audio content? 2. What are the potential limitations of the configurable multilingual model in terms of language coverage and accuracy?
Original Abstract Submitted
embodiments are provided for building a configurable multilingual model. a computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. the computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.