TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Shuming Ma of Beijing (CN)

Li Dong of Beijing (CN)

Shaohan Huang of Beijing (CN)

Dongdong Zhang of Beijing (CN)

Furu Wei of Beijing (CN)

Hongyu Wang of Beijing (CN)

TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER - A simplified explanation of the abstract

This abstract first appeared for US patent application 18176037 titled 'TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

The abstract describes a computing system that trains a transformer network with multiple layers, including attention, feed-forward, and normalization sub-layers.

The processor receives a training data set to train the transformer network.
The transformer network consists of layers with various sub-layers, including normalization sub-layers.
Each normalization sub-layer applies layer normalization to the input and output vectors of the corresponding sub-layer.
The system aims to improve the performance and efficiency of the transformer network through this training process.

Potential Applications: - Natural language processing - Machine translation - Speech recognition

Problems Solved: - Enhancing the performance of transformer networks - Improving the accuracy of machine learning models

Benefits: - Increased efficiency in processing large datasets - Enhanced accuracy in data analysis tasks

Commercial Applications: Title: "Advanced Transformer Network for Enhanced Data Processing" This technology can be utilized in industries such as: - E-commerce for personalized recommendations - Healthcare for medical image analysis - Finance for fraud detection

Prior Art: Researchers can explore existing literature on transformer networks, normalization techniques, and machine learning algorithms to understand the background of this technology.

Frequently Updated Research: Stay updated on advancements in transformer network architectures, normalization methods, and training techniques to enhance the performance of machine learning models.

Questions about Transformer Networks: 1. How do transformer networks differ from traditional neural networks? Transformer networks utilize self-attention mechanisms to process sequential data more effectively compared to traditional recurrent neural networks.

2. What are the key challenges in training transformer networks? Training transformer networks can be computationally intensive due to the large number of parameters involved in the model architecture.

Original Abstract Submitted

A computing system is provided, including a processor configured to receive a training data set. Based at least in part on the training data set, the processor is further configured to train a transformer network that includes a plurality of layers. The plurality of layers each respectively include a plurality of sub-layers including an attention sub-layer, a feed-forward sub-layer, and a plurality of normalization sub-layers. The plurality of normalization sub-layers are downstream from corresponding sub-layers of the plurality of sub-layers. Each of the plurality of normalization sub-layers is configured to apply layer normalization to a sum of: a first scaling parameter multiplied by an input vector of the sub-layer; and an output vector of the sub-layer.

18176037. TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER simplified abstract (Microsoft Technology Licensing, LLC)

Contents

TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

Organization Name

Inventor(s)

TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools