TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

Organization Name

microsoft technology licensing, llc

Inventor(s)

Shuming Ma of Beijing (CN)

Li Dong of Beijing (CN)

Shaohan Huang of Beijing (CN)

Dongdong Zhang of Beijing (CN)

Furu Wei of Beijing (CN)

Hongyu Wang of Beijing (CN)

TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240320482 titled 'TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

Simplified Explanation:

This patent application describes a computing system that trains a transformer network with multiple layers, each containing sub-layers for attention, feed-forward, and normalization. The normalization sub-layers apply layer normalization to the input and output vectors of the sub-layers.

The computing system trains a transformer network with multiple layers.
Each layer includes sub-layers for attention, feed-forward, and normalization.
The normalization sub-layers apply layer normalization to input and output vectors.

Key Features and Innovation:

Training a transformer network with multiple layers.
Utilizing sub-layers for attention, feed-forward, and normalization.
Applying layer normalization to input and output vectors in the normalization sub-layers.

Potential Applications:

This technology can be applied in natural language processing, machine translation, image recognition, and other deep learning tasks.

Problems Solved:

Enhancing the performance and efficiency of transformer networks.
Improving the training process for deep learning models.

Benefits:

Increased accuracy in natural language processing tasks.
Faster training times for deep learning models.
Enhanced performance in image recognition tasks.

Commercial Applications:

Commercial applications include developing advanced AI systems for various industries such as healthcare, finance, and e-commerce.

Prior Art:

Prior research in transformer networks, layer normalization, and deep learning architectures can provide insights into the development of this technology.

Frequently Updated Research:

Stay updated on advancements in transformer networks, layer normalization techniques, and deep learning algorithms to enhance the performance of this technology.

Questions about Transformer Networks: 1. How do transformer networks differ from traditional neural networks? 2. What are the key advantages of using transformer networks in natural language processing tasks?

Questions about Layer Normalization: 1. How does layer normalization improve the training process of deep learning models? 2. What are the potential drawbacks of using layer normalization in neural networks?

Original Abstract Submitted

a computing system is provided, including a processor configured to receive a training data set. based at least in part on the training data set, the processor is further configured to train a transformer network that includes a plurality of layers. the plurality of layers each respectively include a plurality of sub-layers including an attention sub-layer, a feed-forward sub-layer, and a plurality of normalization sub-layers. the plurality of normalization sub-layers are downstream from corresponding sub-layers of the plurality of sub-layers. each of the plurality of normalization sub-layers is configured to apply layer normalization to a sum of: a first scaling parameter multiplied by an input vector of the sub-layer; and an output vector of the sub-layer.

Microsoft technology licensing, llc (20240320482). TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER simplified abstract

Contents

TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

Organization Name

Inventor(s)

TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools