17664031. LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents
Jump to navigation Jump to search

LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Pengcheng He of Sammamish WA (US)

Jianfeng Gao of Woodinville WA (US)

Weizhu Chen of Kirkland WA (US)

LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING - A simplified explanation of the abstract

This abstract first appeared for US patent application 17664031 titled 'LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

Simplified Explanation

The abstract describes a method for training a language model using multitask pretraining. Here is a simplified explanation of the abstract:

  • The method receives vectorized training data for a language model.
  • It generates modified vectorized training data based on the original data using an upstream data embedding.
  • Pretraining output is emitted based on the modified data using a downstream data embedding equivalent to the upstream embedding.
  • The method adjusts the upstream and downstream data embeddings by computing gradients separately, advancing the multitask pretraining problem.

Potential applications of this technology:

  • Natural language processing: This method can be used to train language models for various NLP tasks such as text generation, sentiment analysis, and machine translation.
  • Chatbots and virtual assistants: By improving the language model training, this technology can enhance the conversational abilities of chatbots and virtual assistants.
  • Content generation: It can be applied to generate high-quality content for various purposes like article writing, product descriptions, and social media posts.

Problems solved by this technology:

  • Improved language model training: The method addresses the challenge of training language models by disentangling the gradients of upstream and downstream data embeddings, leading to more effective pretraining.
  • Enhanced performance on multitask problems: By advancing the multitask pretraining problem, this method can improve the performance of language models on various tasks simultaneously.

Benefits of this technology:

  • Higher accuracy: The disentangled gradients allow for more precise adjustments to the upstream and downstream data embeddings, resulting in improved language model accuracy.
  • Efficient training: The method optimizes the training process by utilizing multitask pretraining, reducing the need for extensive manual annotation or task-specific training.
  • Versatility: This technology can be applied to various language processing tasks, making it adaptable to different domains and applications.


Original Abstract Submitted

A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.