17932225. MACHINE LEARNING MODEL COMPRESSION USING WEIGHTED LOW-RANK FACTORIZATION simplified abstract (Samsung Electronics Co., Ltd.)

From WikiPatents
Jump to navigation Jump to search

MACHINE LEARNING MODEL COMPRESSION USING WEIGHTED LOW-RANK FACTORIZATION

Organization Name

Samsung Electronics Co., Ltd.

Inventor(s)

Yen-Chang Hsu of Fremont CA (US)

Ting Hua of Cupertino CA (US)

Feixuan Wang of San Jose CA (US)

Qian Lou of Oviedo FL (US)

Yilin Shen of Santa Clara CA (US)

Hongxia Jin of San Jose CA (US)

MACHINE LEARNING MODEL COMPRESSION USING WEIGHTED LOW-RANK FACTORIZATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17932225 titled 'MACHINE LEARNING MODEL COMPRESSION USING WEIGHTED LOW-RANK FACTORIZATION

Simplified Explanation

The patent application describes a method for compressing a machine learning model by approximating the parameter values of its linear layers. Here are the key points:

  • The method starts by obtaining a parameter matrix that contains the parameter values for the linear layer of a machine learning model.
  • Importance values are determined for each parameter value, indicating their significance in the model.
  • Factorized matrices are generated by multiplying the importance values with the factorized matrices, which provide approximated parameter values for the linear layer.
  • A second machine learning model is then created, representing a compressed version of the original model.
  • The second model includes linear layers with parameter values based on the importance values and factorized matrices.
  • The factorized matrices are generated by considering the weighted errors between the original parameter values and the approximated values, with weights based on the importance values.

Potential applications of this technology:

  • Compressing machine learning models to reduce their size and memory requirements.
  • Enabling efficient deployment of machine learning models on resource-constrained devices.
  • Improving the speed and efficiency of machine learning inference.

Problems solved by this technology:

  • Large machine learning models can be computationally expensive and memory-intensive, making them challenging to deploy in real-world applications.
  • Compressing models while maintaining their performance can be difficult.
  • This method addresses these challenges by approximating the parameter values of linear layers, resulting in a compressed model.

Benefits of this technology:

  • Reduced model size and memory requirements, enabling deployment on devices with limited resources.
  • Faster and more efficient inference, allowing real-time applications.
  • Maintains a good level of performance by considering the importance of each parameter value.


Original Abstract Submitted

A method includes obtaining a parameter matrix associated with a linear layer of a first machine learning model and containing parameter values for parameters of the linear layer. The method also includes determining importance values corresponding to the parameter values. The method further includes generating factorized matrices such that a product of the importance values and factorized matrices contains approximated parameter values for the parameters of the linear layer. In addition, the method includes generating a second machine learning model representing a compressed version of the first machine learning model. The second machine learning model has first and second linear layers containing parameter values based on the importance values and the factorized matrices. The factorized matrices are generated based on weighted errors between the parameter values for the parameters of the linear layer and the approximated parameter values. Weights associated with the weighted errors are based on the importance values.