METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS

Organization Name

Inventor(s)

David Philip Lloyd Thorsley of Morgan Hill CA (US)

METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17475330 titled 'METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS

Simplified Explanation

The patent application describes a method to reduce computation in a self-attention deep-learning model. Here are the key points:

A feature-map regularization term is added to the loss function during training of the self-attention model.
This regularization term helps to reduce the activation values of feature maps in the model.
During inference, at least one low-magnitude feature is removed from at least one feature map of the self-attention model.
The low-magnitude feature is set to zero if its value is below a predetermined threshold.
After training, the weights of the self-attention model are quantized to further reduce computation.
The feature maps of the self-attention model are also compressed.

Potential applications of this technology:

Deep learning models that utilize self-attention mechanisms can benefit from reduced computation, making them more efficient for various tasks.
This method can be applied to natural language processing tasks, such as machine translation or text generation, where self-attention models are commonly used.
It can also be used in computer vision tasks, such as image classification or object detection, where self-attention models have shown promising results.

Problems solved by this technology:

Deep learning models often require significant computational resources, which can limit their deployment in resource-constrained environments.
By reducing computation through feature-map regularization, removal of low-magnitude features, and weight quantization, this method addresses the problem of high computational requirements in self-attention models.
It allows for more efficient inference and deployment of self-attention models on devices with limited resources.

Benefits of this technology:

The method reduces the computational complexity of self-attention models, making them more practical for real-world applications.
By removing low-magnitude features and quantizing weights, the model size is reduced, enabling faster inference and lower memory requirements.
The feature-map regularization term helps to improve the generalization and robustness of the self-attention model by reducing overfitting.

Original Abstract Submitted

A method is disclosed to reduce computation in a self-attention deep-learning model. A feature-map regularization term is added to a loss function while training the self-attention model. At least one low-magnitude feature is removed from at least one feature map of the self-attention model during inference. Weights of the self-attention model are quantized after the self-attention model has been trained. Adding the feature-map regularization term reduces activation values of feature maps, and removing the at least one low-magnitude feature from at least one feature map may be performed by setting the low-magnitude feature to be equal to zero based on the low-magnitude feature having a value that is less than a predetermined threshold. Feature maps of the self-attention model quantized and compressed.

17475330. METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

Contents

METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS

Organization Name

Inventor(s)