17525908. Adaptive Token Sampling for Efficient Transformer simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents
Jump to navigation Jump to search

Adaptive Token Sampling for Efficient Transformer

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Mohsen Fayyaz of Bonn (DE)

Soroush Abbasi Koohpayegani of Baltimore MD (US)

Eric Chris Wolfgang Sommerlade of Oxford (GB)

Hamidreza Vaezi Joze of Redmond WA (US)

Adaptive Token Sampling for Efficient Transformer - A simplified explanation of the abstract

This abstract first appeared for US patent application 17525908 titled 'Adaptive Token Sampling for Efficient Transformer

Simplified Explanation

The abstract describes a transformer technology that improves the efficiency of processing data items, such as images, by using a modified attention component. The modified attention component consists of three stages:

  • The first stage generates original attention information based on embedding vectors representing item tokens and a classification token.
  • The second stage generates score information based on a portion of the original attention information related to the classification token.
  • The third stage produces modified attention information by removing attention values from the original attention information, guided by a sampling operation performed on the score information.

This modified attention component does not require machine-trained values, making it easier to implement in existing transformers.

Potential applications of this technology:

  • Image processing: The transformer can efficiently process image data items using the modified attention component.
  • Natural language processing: The technology can be applied to process text data items with improved efficiency.

Problems solved by this technology:

  • Increased efficiency: The modified attention component enhances the efficiency of transformer-based technology by optimizing the attention mechanism.
  • Simplified deployment: The absence of machine-trained values in the second and third stages expedites the deployment of these functions in existing transformers.

Benefits of this technology:

  • Improved processing speed: The modified attention component increases the efficiency of transformer-based technology, resulting in faster data processing.
  • Ease of implementation: The absence of machine-trained values simplifies the deployment of the modified attention component in existing transformers.


Original Abstract Submitted

A transformer is described herein for using transformer-based technology to process data items (e.g., image items). The transformer increases the efficiency of the transformer-based technology by using a modified attention component. In operation, the modified attention component accepts embedding vectors that represent a plurality of item tokens, together with a classification token. A first stage of the modified attention component generates original attention information based on the embedding vectors. A second stage generates score information based on a portion of the original attention information that pertains to the classification token. A third stage produces modified attention information by removing attention values from the original attention information, as guided by a sampling operation that is performed on the score information. The second and third stages do not rely on machine-trained values, which expedites the deployment of these functions in existing transformers.