Google llc (20240295979). Multi-Pass Distributed Data Shuffle simplified abstract

From WikiPatents
Jump to navigation Jump to search

Multi-Pass Distributed Data Shuffle

Organization Name

google llc

Inventor(s)

Mohsen Vakilian of Kirkland WA (US)

Hossein Ahmadi of Seattle WA (US)

Multi-Pass Distributed Data Shuffle - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240295979 titled 'Multi-Pass Distributed Data Shuffle

The patent application describes a system and method for repartitioning data in a distributed network. The method involves two passes of a data set from multiple sources to multiple sinks, resulting in the reorganization of data distribution.

  • The method includes executing a first pass of the data set from a plurality of first sources to a plurality of first sinks.
  • Each first sink collects data from one or more of the first sources.
  • Subsequently, a second pass of the data set is executed from a plurality of second sources to a plurality of second sinks.
  • Each first sink corresponds to one of the second sources, and each second sink collects data from one or more of the second sources.
  • This process results in the repartitioning of the data set, allowing second sinks to collect data from multiple first sources.

Potential Applications: - Data processing and analysis in large-scale distributed systems - Network optimization and load balancing - Streamlining data transfer and storage in cloud computing environments

Problems Solved: - Efficient redistribution of data in complex network architectures - Minimization of data transfer latency and bottlenecks - Improved scalability and resource utilization in distributed systems

Benefits: - Enhanced data processing speed and efficiency - Optimal resource allocation and utilization - Increased network performance and reliability

Commercial Applications: Title: "Distributed Data Repartitioning System for Enhanced Network Performance" This technology can be utilized in various industries such as cloud computing, big data analytics, and IoT networks to optimize data distribution and improve overall system performance.

Questions about the technology: 1. How does the repartitioning process improve data processing efficiency in distributed networks? 2. What are the key advantages of using multiple passes for data redistribution in a network environment?


Original Abstract Submitted

a system and method for repartitioning data in a distributed network. the method may include executing, by one or more processors, a first pass of a data set from a plurality of first sources to a plurality of first sinks, each first sink collecting data from one or more of the first sources, and executing, by the one or more processors, a second pass of the data set from a plurality of second sources to a plurality of second sinks, each one of the plurality of first sinks corresponding to one of the plurality of second sources, and each second sink collecting data from one or more of the second sources. executing the first and second passes causes the data set to be repartitioned such that one or more second sinks collect data that originated from two or more of the first sources.