18590675. METHODS AND SYSTEMS FOR ENHANCING MULTIMODAL CAPABILITIES IN LARGE LANGUAGE MODELS (Microsoft Technology Licensing, LLC)
METHODS AND SYSTEMS FOR ENHANCING MULTIMODAL CAPABILITIES IN LARGE LANGUAGE MODELS
Organization Name
Microsoft Technology Licensing, LLC
Inventor(s)
Zhuo Chen of Woodinville WA US
Sunit Sivasankaran of Redmond WA US
METHODS AND SYSTEMS FOR ENHANCING MULTIMODAL CAPABILITIES IN LARGE LANGUAGE MODELS
This abstract first appeared for US patent application 18590675 titled 'METHODS AND SYSTEMS FOR ENHANCING MULTIMODAL CAPABILITIES IN LARGE LANGUAGE MODELS
Original Abstract Submitted
Systems and methods are provided for enhancing the speech modality in a large language model (LLM) and for retaining in-context learning capabilities without overfitting to trained tasks. Systems obtain a first set of training data comprising tuples of a sample of speech combined with synthetically generated pairings of speech comprehension test questions and answers that correspond to the sample of speech and obtain a second set of training data comprising pairings of automatic speech recognition data. Systems generate and align a first set of encodings of the first set of training data and a second set of encodings of the second set of training data. Systems train the LLM on a greater amount of the first set of training data than the second set of training data and use the trained LLM to perform a natural language processing task.
(Ad) Transform your business with AI in minutes, not months
Trusted by 1,000+ companies worldwide