17840169. Generation and Explanation of Transformer Computation Graph Using Graph Attention Model simplified abstract (Microsoft Technology Licensing, LLC)

From WikiPatents
Jump to navigation Jump to search

Generation and Explanation of Transformer Computation Graph Using Graph Attention Model

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Leo Moreno Betthauser of Kirkland WA (US)

Maurice Diesendruck of Bellevue WA (US)

Generation and Explanation of Transformer Computation Graph Using Graph Attention Model - A simplified explanation of the abstract

This abstract first appeared for US patent application 17840169 titled 'Generation and Explanation of Transformer Computation Graph Using Graph Attention Model

Simplified Explanation

The patent application describes a data processing system that analyzes attention matrices generated by a pretrained machine learning model with self-attention layers. The system then creates a computation graph based on these matrices to understand the behavior of the model across its layers. This computation graph is further analyzed by a second machine learning model trained to provide information about the behavior of the first model, specifically identifying which layers performed specific tasks related to generating predictions.

  • Obtaining attention matrices from a pretrained machine learning model with self-attention layers.
  • Analyzing the attention matrices to create a computation graph representing the behavior of the model across its layers.
  • Using a second machine learning model to analyze the computation graph and provide information about the behavior of the first model.
  • The second model identifies which layers of the first model performed specific tasks associated with generating predictions.

Potential Applications

  • Improving the interpretability and understanding of complex machine learning models.
  • Enhancing model debugging and troubleshooting processes.
  • Assisting in model optimization and performance improvement.
  • Enabling better model explainability and transparency.

Problems Solved

  • Lack of transparency and interpretability in complex machine learning models.
  • Difficulty in understanding the behavior and decision-making process of pretrained models.
  • Challenges in identifying specific layers responsible for certain tasks within a model.

Benefits

  • Provides insights into the behavior of pretrained machine learning models.
  • Facilitates model debugging and optimization.
  • Enhances model explainability and transparency.
  • Enables better understanding of the decision-making process within a model.


Original Abstract Submitted

A data processing system implements obtaining attention matrices from a first machine learning model that is pretrained and includes a plurality of self-attention layers. The data processing system further implements analyzing the attention matrices to generate a computation graph based on the attention matrices. The computation graph provides a representation of behavior of the first machine learning model across the plurality of self-attention layers. The data processing system is further implements analyzing the computation graph using a second machine learning model. The second machine learning model is trained to receive the computation graph to output model behavior information. The model behavior information identifying which layers of model performed specific tasks associated with generating predictions by the first machine learning model.