17664031. LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)
Contents
LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING
Organization Name
MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor(s)
Pengcheng He of Sammamish WA (US)
Jianfeng Gao of Woodinville WA (US)
Weizhu Chen of Kirkland WA (US)
LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING - A simplified explanation of the abstract
This abstract first appeared for US patent application 17664031 titled 'LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING
Simplified Explanation
The abstract describes a method for training a language model using multitask pretraining. Here is a simplified explanation of the abstract:
- The method receives vectorized training data for a language model.
- It generates modified vectorized training data based on the original data using an upstream data embedding.
- Pretraining output is emitted based on the modified data using a downstream data embedding equivalent to the upstream embedding.
- The method adjusts the upstream and downstream data embeddings by computing gradients separately, advancing the multitask pretraining problem.
Potential applications of this technology:
- Natural language processing: This method can be used to train language models for various NLP tasks such as text generation, sentiment analysis, and machine translation.
- Chatbots and virtual assistants: By improving the language model training, this technology can enhance the conversational abilities of chatbots and virtual assistants.
- Content generation: It can be applied to generate high-quality content for various purposes like article writing, product descriptions, and social media posts.
Problems solved by this technology:
- Improved language model training: The method addresses the challenge of training language models by disentangling the gradients of upstream and downstream data embeddings, leading to more effective pretraining.
- Enhanced performance on multitask problems: By advancing the multitask pretraining problem, this method can improve the performance of language models on various tasks simultaneously.
Benefits of this technology:
- Higher accuracy: The disentangled gradients allow for more precise adjustments to the upstream and downstream data embeddings, resulting in improved language model accuracy.
- Efficient training: The method optimizes the training process by utilizing multitask pretraining, reducing the need for extensive manual annotation or task-specific training.
- Versatility: This technology can be applied to various language processing tasks, making it adaptable to different domains and applications.
Original Abstract Submitted
A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.