LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

Organization Name

Inventor(s)

LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING - A simplified explanation of the abstract

This abstract first appeared for US patent application 17664031 titled 'LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

Simplified Explanation

The abstract describes a method for training a language model using multitask pretraining. Here is a simplified explanation of the abstract:

The method receives vectorized training data for a language model.
It generates modified vectorized training data based on the original data using an upstream data embedding.
Pretraining output is emitted based on the modified data using a downstream data embedding equivalent to the upstream embedding.
The method adjusts the upstream and downstream data embeddings by computing gradients separately, advancing the multitask pretraining problem.

Potential applications of this technology:

Natural language processing: This method can be used to train language models for various NLP tasks such as text generation, sentiment analysis, and machine translation.
Chatbots and virtual assistants: By improving the language model training, this technology can enhance the conversational abilities of chatbots and virtual assistants.
Content generation: It can be applied to generate high-quality content for various purposes like article writing, product descriptions, and social media posts.

Problems solved by this technology:

Improved language model training: The method addresses the challenge of training language models by disentangling the gradients of upstream and downstream data embeddings, leading to more effective pretraining.
Enhanced performance on multitask problems: By advancing the multitask pretraining problem, this method can improve the performance of language models on various tasks simultaneously.

Benefits of this technology:

Higher accuracy: The disentangled gradients allow for more precise adjustments to the upstream and downstream data embeddings, resulting in improved language model accuracy.
Efficient training: The method optimizes the training process by utilizing multitask pretraining, reducing the need for extensive manual annotation or task-specific training.
Versatility: This technology can be applied to various language processing tasks, making it adaptable to different domains and applications.

Original Abstract Submitted

A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.

17664031. LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

Contents

LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

Organization Name

Inventor(s)