COMPUTER SYSTEMS FOR COMPRESSING TRANSFORMER MODELS AND QUANTIZATION TRAINING METHODS THEREOF

Organization Name

Inventor(s)

COMPUTER SYSTEMS FOR COMPRESSING TRANSFORMER MODELS AND QUANTIZATION TRAINING METHODS THEREOF - A simplified explanation of the abstract

This abstract first appeared for US patent application 18101744 titled 'COMPUTER SYSTEMS FOR COMPRESSING TRANSFORMER MODELS AND QUANTIZATION TRAINING METHODS THEREOF

Simplified Explanation

The abstract describes a method for quantization learning using a model quantizer in a computer system to compress a transformer model. The method involves generating a student model by quantizing the transformer model. It then performs a first quantization learning by inserting a self-attention map of a teacher model into a self-attention map of the student model. Finally, a second quantization learning is performed using a knowledge distillation method to ensure that the self-attention map of the student model follows that of the teacher model.

The method involves quantization learning of a transformer model.
A student model is generated by quantizing the transformer model.
The first quantization learning is performed by inserting a self-attention map of a teacher model into the student model.
The second quantization learning uses knowledge distillation to align the self-attention map of the student model with that of the teacher model.

Potential applications of this technology:

Compression of transformer models for efficient storage and deployment.
Improving the performance of compressed models by aligning self-attention maps.
Enabling faster inference and reduced memory requirements for transformer models.

Problems solved by this technology:

Large transformer models can be computationally expensive and memory-intensive.
Compressed models can suffer from performance degradation.
Knowledge distillation helps maintain the performance of compressed models.

Benefits of this technology:

Efficient storage and deployment of transformer models.
Improved performance of compressed models.
Faster inference and reduced memory requirements.
Enables the use of transformer models in resource-constrained environments.

Original Abstract Submitted

A method for quantization learning by a model quantizer that is operating in a computer system and compressing a transformer model. The method may include generating a student model through quantization of the transformer model, performing a first quantization learning by inserting a self-attention map of a teacher model into a self-attention map of the student model, and performing a second quantization learning using a knowledge distillation method so that the self-attention map of the student model follows the self-attention map of the teacher model.

18101744. COMPUTER SYSTEMS FOR COMPRESSING TRANSFORMER MODELS AND QUANTIZATION TRAINING METHODS THEREOF simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

Contents

COMPUTER SYSTEMS FOR COMPRESSING TRANSFORMER MODELS AND QUANTIZATION TRAINING METHODS THEREOF

Organization Name

Inventor(s)