US Patent Application 18365047. NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS simplified abstract
NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS
Organization Name
Huawei Technologies Co., Ltd.==Inventor(s)==
[[Category:Aladin Virmaux of Boulogne Billancourt (FR)]]
[[Category:George Dasoulas of Boulogne Billancourt (FR)]]
[[Category:Kevin Scaman of Munich (DE)]]
NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS - A simplified explanation of the abstract
This abstract first appeared for US patent application 18365047 titled 'NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS
Simplified Explanation
The patent application describes a data processing device that performs attention-based operations on a graph neural network.
- The device receives one or more input graphs, each with multiple nodes.
- For each input graph, the device forms an input node representation for each node, with a defined norm for each representation.
- The device forms a set of attention parameters.
- The input node representations are multiplied with the attention parameters to form a score function for the input graph.
- The score function is normalized based on the maximum norms of the input node representations, resulting in a normalized score function.
- A weighted node representation is formed by weighting each node in the input graph using an element of the normalized score function.
- The normalization of the score function improves the performance of deep attention-based neural networks by enforcing Lipschitz continuity.
Original Abstract Submitted
Described is a data processing device for performing an attention-based operation on a graph neural network. The device is configured to receive one or more input graphs each having a plurality of nodes and to, for at least one of the input graphs: form an input node representation for each node in the respective input graph, wherein a respective norm is defined for each input node representation; form a set of attention parameters; multiply each of the input node representations with each of the set of attention parameters to form a score function of the respective input graph; normalize the score function based on a maximum of the norms of the input node representations to form a normalised score function; and form a weighted node representation by weighting each node in the respective input graph using a respective element of the normalised score function. The normalization of the score function enables deep attention-based neural networks to perform better by enforcing Lipschitz continuity.
(Ad) Transform your business with AI in minutes, not months
Trusted by 1,000+ companies worldwide