18073383. SMALL AND FAST TRANSFORMER MODEL FOR MULTI-MODAL OR OTHER TASKS simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

SMALL AND FAST TRANSFORMER MODEL FOR MULTI-MODAL OR OTHER TASKS

Organization Name

SAMSUNG ELECTRONICS CO., LTD.

Inventor(s)

Qian Lou of Oviedo FL (US)

Yen-Chang Hsu of Fremont CA (US)

Burak Uzkent of Mountain View CA (US)

Ting Hua of Cupertino CA (US)

Yilin Shen of San Jose CA (US)

Hongxia Jin of San Jose CA (US)

SMALL AND FAST TRANSFORMER MODEL FOR MULTI-MODAL OR OTHER TASKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18073383 titled 'SMALL AND FAST TRANSFORMER MODEL FOR MULTI-MODAL OR OTHER TASKS

Simplified Explanation

The patent application describes a method for compressing and deploying a trained transformer model on a second electronic device without transferring the entire weight matrix.

  • The weight matrix of a trained transformer model is obtained using a first electronic device.
  • The weight matrix is factorized into a dictionary weight matrix and an intermediate matrix.
  • The intermediate matrix is pruned to create a sparse intermediate matrix.
  • The sparse intermediate matrix is fine-tuned based on a training dataset to generate a fine-tuned sparse intermediate matrix.
  • An index matrix and a coefficient matrix are determined based on the fine-tuned sparse intermediate matrix.
  • The dictionary weight matrix, index matrix, and coefficient matrix are deployed to a second electronic device without transferring the weight matrix.
  • The number of parameters in the deployed matrices is smaller than the number of parameters in the original weight matrix.

Potential Applications

  • Efficient deployment of trained transformer models on resource-constrained devices.
  • Enabling real-time natural language processing on mobile devices.
  • Improving the performance of machine learning applications on edge devices.

Problems Solved

  • Reducing the computational and storage requirements for deploying transformer models on devices with limited resources.
  • Overcoming the limitations of transferring large weight matrices between devices.
  • Enabling efficient deployment of complex machine learning models on edge devices.

Benefits

  • Reduced memory and storage requirements for deploying transformer models.
  • Faster deployment and inference on resource-constrained devices.
  • Improved efficiency and performance of machine learning applications on edge devices.


Original Abstract Submitted

A method includes obtaining, using a first electronic device, a weight matrix associated with a trained transformer model. The method also includes factorizing the weight matrix into a dictionary weight matrix and an intermediate matrix. The method further includes pruning the intermediate matrix to generate a sparse intermediate matrix. The method also includes fine-tuning the sparse intermediate matrix based on a training dataset to generate a fine-tuned sparse intermediate matrix. The method further includes determining an index matrix and a coefficient matrix based on the fine-tuned sparse intermediate matrix. In addition, the method includes deploying the dictionary weight matrix, the index matrix, and the coefficient matrix to a second electronic device without deploying the weight matrix to the second electronic device. A number of parameters in the dictionary weight matrix, the index matrix, and the coefficient matrix is smaller than a number of parameters in the weight matrix.