17953991. PARALLEL COMPUTING SCHEME GENERATION FOR NEURAL NETWORKS simplified abstract (Huawei Technologies Co., Ltd.)

From WikiPatents
Jump to navigation Jump to search

PARALLEL COMPUTING SCHEME GENERATION FOR NEURAL NETWORKS

Organization Name

Huawei Technologies Co., Ltd.

Inventor(s)

Chong Li of Boulogne Billancourt (FR)

Thibaut Tachon of Boulogne Billancourt (FR)

Hongxing Wang of Shenzhen (CN)

Kelun Chai of Boulogne Billancourt (FR)

Chang Liu of Shenzhen (CN)

PARALLEL COMPUTING SCHEME GENERATION FOR NEURAL NETWORKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17953991 titled 'PARALLEL COMPUTING SCHEME GENERATION FOR NEURAL NETWORKS

Simplified Explanation

The patent application describes a device that receives a computation graph and converts it into a dataflow graph consisting of recursive subgraphs. Each recursive subgraph contains another recursive subgraph and an operator node, or it can be empty. The device determines the number of partitioning recursions based on the available parallel computing devices. It then calculates costs associated with operator nodes, determines the processing order of the recursive subgraphs, and processes them accordingly. The device selects a partitioning axis for tensors associated with each operator node and outputs a partitioning scheme for the tensors.

  • The device receives a computation graph and transforms it into a dataflow graph with recursive subgraphs.
  • Each recursive subgraph contains another recursive subgraph and an operator node, or it can be empty.
  • The device determines the number of partitioning recursions based on the available parallel computing devices.
  • Costs corresponding to operator nodes are calculated to aid in determining the processing order of the recursive subgraphs.
  • The device processes the recursive subgraphs by selecting a partitioning axis for tensors associated with each operator node.
  • A partitioning scheme is then outputted, specifying the partitioning axes for each tensor associated with the operator nodes.

Potential Applications

This technology could be applied in various fields where computation graphs need to be efficiently processed and partitioned across parallel computing devices. Some potential applications include:

  • High-performance computing systems
  • Distributed computing environments
  • Artificial intelligence and machine learning frameworks
  • Data analytics platforms

Problems Solved

The technology addresses the following problems:

  • Efficiently transforming computation graphs into dataflow graphs with recursive subgraphs.
  • Determining the optimal number of partitioning recursions based on the available parallel computing devices.
  • Calculating costs associated with operator nodes to determine the processing order of recursive subgraphs.
  • Selecting partitioning axes for tensors associated with operator nodes to improve processing efficiency.

Benefits

The technology offers several benefits:

  • Improved efficiency in processing computation graphs by transforming them into dataflow graphs.
  • Optimal partitioning of recursive subgraphs across parallel computing devices.
  • Cost-based processing order determination for improved performance.
  • Enhanced processing efficiency through the selection of partitioning axes for tensors.


Original Abstract Submitted

A device receives a computation graph and transforms the computation graph into a dataflow graph comprising recursive subgraphs. Each recursive subgraph comprises a tuple of another recursive subgraph and an operator node, or an empty graph. The device determines a number of partitioning recursions based on a number of parallel computing devices. For each partitioning recursion, the device determines costs corresponding to operator nodes, determines a processing order of the recursive subgraphs, and processes the recursive subgraphs. To process a recursive subgraph, the device selects a partitioning axis for tensors associated with an operator node of the recursive subgraph. The device outputs a partitioning scheme comprising partitioning axes for each tensor associated with the operator nodes.