NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS

Organization Name

Huawei Technologies Co., Ltd.==Inventor(s)==

[[Category:Aladin Virmaux of Boulogne Billancourt (FR)]]

[[Category:George Dasoulas of Boulogne Billancourt (FR)]]

[[Category:Kevin Scaman of Munich (DE)]]

NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18365047 titled 'NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS

Simplified Explanation

The patent application describes a data processing device that performs attention-based operations on a graph neural network.

The device receives one or more input graphs, each with multiple nodes.
For each input graph, the device forms an input node representation for each node, with a defined norm for each representation.
The device forms a set of attention parameters.
The input node representations are multiplied with the attention parameters to form a score function for the input graph.
The score function is normalized based on the maximum norms of the input node representations, resulting in a normalized score function.
A weighted node representation is formed by weighting each node in the input graph using an element of the normalized score function.
The normalization of the score function improves the performance of deep attention-based neural networks by enforcing Lipschitz continuity.

Original Abstract Submitted

Described is a data processing device for performing an attention-based operation on a graph neural network. The device is configured to receive one or more input graphs each having a plurality of nodes and to, for at least one of the input graphs: form an input node representation for each node in the respective input graph, wherein a respective norm is defined for each input node representation; form a set of attention parameters; multiply each of the input node representations with each of the set of attention parameters to form a score function of the respective input graph; normalize the score function based on a maximum of the norms of the input node representations to form a normalised score function; and form a weighted node representation by weighting each node in the respective input graph using a respective element of the normalised score function. The normalization of the score function enables deep attention-based neural networks to perform better by enforcing Lipschitz continuity.

US Patent Application 18365047. NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS simplified abstract

Contents

NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS

Organization Name

NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS - A simplified explanation of the abstract

Simplified Explanation

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools