18660831. Multi-Pass Distributed Data Shuffle simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Multi-Pass Distributed Data Shuffle

Organization Name

GOOGLE LLC

Inventor(s)

Mohsen Vakilian of Kirkland WA (US)

Hossein Ahmadi of Seattle WA (US)

Multi-Pass Distributed Data Shuffle - A simplified explanation of the abstract

This abstract first appeared for US patent application 18660831 titled 'Multi-Pass Distributed Data Shuffle

The abstract describes a system and method for repartitioning data in a distributed network by executing two passes of a data set from multiple sources to multiple sinks, resulting in the data set being reorganized to allow sinks to collect data from multiple sources.

  • The method involves executing a first pass of data from first sources to first sinks, and then a second pass of data from second sources to second sinks, leading to the reorganization of the data set.
  • Each first sink collects data from one or more first sources, and each second sink collects data from one or more second sources, creating a new data distribution pattern.
  • This process enables second sinks to collect data that originated from two or more first sources, improving data collection efficiency and distribution in the network.
  • By reorganizing the data set in this manner, the system optimizes data flow and enhances the overall performance of the distributed network.
  • The method allows for more efficient data processing and management, leading to improved network scalability and reliability.

Potential Applications: - Data processing and management in large-scale distributed systems - Network optimization and performance enhancement - Real-time data analytics and processing

Problems Solved: - Inefficient data distribution in distributed networks - Data bottlenecks and congestion in network communication - Suboptimal data processing and management strategies

Benefits: - Improved data collection efficiency - Enhanced network performance and scalability - Real-time data processing capabilities

Commercial Applications: Title: Enhanced Data Repartitioning System for Distributed Networks This technology can be utilized in various industries such as: - Cloud computing services - Big data analytics platforms - IoT networks and applications

Questions about Data Repartitioning System: 1. How does the system ensure data integrity during the repartitioning process? The system employs data validation techniques and error-checking mechanisms to maintain data integrity throughout the repartitioning process.

2. What are the key factors that determine the optimal reorganization of data in the distributed network? The system considers factors such as data volume, network bandwidth, and source-sink relationships to determine the most efficient data repartitioning strategy.


Original Abstract Submitted

A system and method for repartitioning data in a distributed network. The method may include executing, by one or more processors, a first pass of a data set from a plurality of first sources to a plurality of first sinks, each first sink collecting data from one or more of the first sources, and executing, by the one or more processors, a second pass of the data set from a plurality of second sources to a plurality of second sinks, each one of the plurality of first sinks corresponding to one of the plurality of second sources, and each second sink collecting data from one or more of the second sources. Executing the first and second passes causes the data set to be repartitioned such that one or more second sinks collect data that originated from two or more of the first sources.