17934445. SMALL AND FAST TRANSFORMER WITH SHARED DICTIONARY simplified abstract (Samsung Electronics Co., Ltd.)

From WikiPatents
Jump to navigation Jump to search

SMALL AND FAST TRANSFORMER WITH SHARED DICTIONARY

Organization Name

Samsung Electronics Co., Ltd.

Inventor(s)

Qian Lou of Oviedo FL (US)

Yilin Shen of San Jose CA (US)

Hongxia Jin of San Jose CA (US)

Ting Hua of Cupertino CA (US)

Yen-Chang Hsu of Fremont CA (US)

SMALL AND FAST TRANSFORMER WITH SHARED DICTIONARY - A simplified explanation of the abstract

This abstract first appeared for US patent application 17934445 titled 'SMALL AND FAST TRANSFORMER WITH SHARED DICTIONARY

Simplified Explanation

The abstract describes a method for training a machine learning model with multiple encoder blocks, each consisting of an attention layer and a feedforward network. The method involves using training corpora to train an attention dictionary that is shared across all encoder blocks. The attention parameters of each encoder block are trained by combining columns from the attention dictionary.

  • The method involves training a machine learning model with multiple encoder blocks.
  • Each encoder block consists of an attention layer and a feedforward network.
  • The attention dictionary is shared across all encoder blocks.
  • Training the attention dictionary involves training the attention parameters of each encoder block.
  • The attention parameters for a specific encoder block are a weighted combination of columns from the attention dictionary.

Potential Applications

  • Natural language processing tasks such as machine translation, text summarization, and sentiment analysis.
  • Speech recognition and speech synthesis systems.
  • Image and video processing applications, including object recognition and scene understanding.

Problems Solved

  • Training a machine learning model with multiple encoder blocks efficiently.
  • Improving the performance of attention layers in the model.
  • Enhancing the accuracy and effectiveness of various natural language processing and image processing tasks.

Benefits

  • Improved training efficiency and performance of machine learning models.
  • Enhanced accuracy and effectiveness of natural language processing and image processing tasks.
  • More accurate and reliable results in tasks such as machine translation, speech recognition, and object recognition.


Original Abstract Submitted

A method includes receiving one or more training corpora for training a machine learning model having a plurality of encoder blocks, where each encoder block includes an attention layer and a feedforward network. The method also includes using the one or more training corpora to train an attention dictionary shared across the plurality of encoder blocks. Training the attention dictionary may include training attention parameters of the attention layer in each of the plurality of encoder blocks, and the attention parameters for a given encoder block among the plurality of encoder blocks may be a weighted combination of columns from the attention dictionary shared across the plurality of encoder blocks.