METHOD AND DEVICE FOR COMPRESSING GENERATIVE PRE-TRAINED LANGUAGE MODELS VIA QUANTIZATION

Organization Name

huawei technologies co., ltd.

Inventor(s)

Lu Hou of Shenzhen (CN)

Chaofan Tao of Shenzhen (CN)

Wei Zhang of Shenzhen (CN)

Lifeng Shang of Hong Kong (CN)

Xin Jiang of Hong Kong (CN)

Qun Liu of Hong Kong (CN)

Li Qian of Shenzhen (CN)

METHOD AND DEVICE FOR COMPRESSING GENERATIVE PRE-TRAINED LANGUAGE MODELS VIA QUANTIZATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240104346 titled 'METHOD AND DEVICE FOR COMPRESSING GENERATIVE PRE-TRAINED LANGUAGE MODELS VIA QUANTIZATION

Simplified Explanation

The method described in the abstract is for quantizing a neural network model by determining a scaling factor based on the distribution of weights, quantizing weights based on the scaling factor, determining training loss based on the quantized weights, and updating the scaling factor based on the gradient of the training loss.

Determining a scaling factor based on weight distribution
Quantizing weights based on the scaling factor
Calculating training loss using quantized weights
Updating the scaling factor based on the gradient of the training loss

Potential Applications

The technology could be applied in various fields such as:

Machine learning
Artificial intelligence
Data analysis

Problems Solved

This technology helps in:

Reducing memory usage
Improving computational efficiency
Enhancing model performance

Benefits

The benefits of this technology include:

Faster processing speeds
Lower resource requirements
Improved accuracy of neural network models

Potential Commercial Applications

This technology could be used in:

Image recognition systems
Natural language processing applications
Autonomous vehicles

Possible Prior Art

One possible prior art could be:

Research papers on weight quantization in neural networks

Unanswered Questions

How does this method compare to existing quantization techniques in terms of accuracy and efficiency?

This article does not provide a direct comparison with existing quantization techniques. Further research or experimentation may be needed to determine the performance differences.

What impact could this method have on the deployment of neural network models in resource-constrained environments?

The article does not discuss the specific implications for resource-constrained environments. Future studies could explore the potential benefits of this method in such settings.

Original Abstract Submitted

a method is provided for quantizing a neural network model performed by a processing system. the method comprises determining a scaling factor based on a distribution of weights associated with the neural network model, determining quantized weights based on the scaling factor and the weights associated with the distribution, determining a training loss of the neural network model based on the quantized weights during training of the neural network model, and determining an updated scaling factor for the neural network model based on a gradient of the training loss.

Huawei technologies co., ltd. (20240104346). METHOD AND DEVICE FOR COMPRESSING GENERATIVE PRE-TRAINED LANGUAGE MODELS VIA QUANTIZATION simplified abstract

Contents

METHOD AND DEVICE FOR COMPRESSING GENERATIVE PRE-TRAINED LANGUAGE MODELS VIA QUANTIZATION

Organization Name

Inventor(s)

METHOD AND DEVICE FOR COMPRESSING GENERATIVE PRE-TRAINED LANGUAGE MODELS VIA QUANTIZATION - A simplified explanation of the abstract

Simplified Explanation

Potential Applications

Problems Solved

Benefits

Potential Commercial Applications

Possible Prior Art

Unanswered Questions

How does this method compare to existing quantization techniques in terms of accuracy and efficiency?

What impact could this method have on the deployment of neural network models in resource-constrained environments?

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools