US Patent Application 18365047. NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS simplified abstract

From WikiPatents
Jump to navigation Jump to search

NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS

Organization Name

Huawei Technologies Co., Ltd.==Inventor(s)==

[[Category:Aladin Virmaux of Boulogne Billancourt (FR)]]

[[Category:George Dasoulas of Boulogne Billancourt (FR)]]

[[Category:Kevin Scaman of Munich (DE)]]

NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18365047 titled 'NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS

Simplified Explanation

The patent application describes a data processing device that performs attention-based operations on a graph neural network.

  • The device receives one or more input graphs, each with multiple nodes.
  • For each input graph, the device forms an input node representation for each node, with a defined norm for each representation.
  • The device forms a set of attention parameters.
  • The input node representations are multiplied with the attention parameters to form a score function for the input graph.
  • The score function is normalized based on the maximum norms of the input node representations, resulting in a normalized score function.
  • A weighted node representation is formed by weighting each node in the input graph using an element of the normalized score function.
  • The normalization of the score function improves the performance of deep attention-based neural networks by enforcing Lipschitz continuity.


Original Abstract Submitted

Described is a data processing device for performing an attention-based operation on a graph neural network. The device is configured to receive one or more input graphs each having a plurality of nodes and to, for at least one of the input graphs: form an input node representation for each node in the respective input graph, wherein a respective norm is defined for each input node representation; form a set of attention parameters; multiply each of the input node representations with each of the set of attention parameters to form a score function of the respective input graph; normalize the score function based on a maximum of the norms of the input node representations to form a normalised score function; and form a weighted node representation by weighting each node in the respective input graph using a respective element of the normalised score function. The normalization of the score function enables deep attention-based neural networks to perform better by enforcing Lipschitz continuity.