18047872. BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS simplified abstract (Snowflake Inc.)

From WikiPatents
Jump to navigation Jump to search

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

Organization Name

Snowflake Inc.

Inventor(s)

Xinzhu Cai of San Mateo CA (US)

Florian Andreas Funke of San Francisco CA (US)

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18047872 titled 'BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

Simplified Explanation: The patent application describes systems and methods for handling build-side skew in a join operation by partitioning a build-side row set using frequent hash values.

  • Detect frequent hash values in a join operation
  • Partition build-side row set using frequent hash values
  • Distribute partitioned build-side row set to multiple servers for processing

Key Features and Innovation:

  • Computing hash values for a join operation
  • Sampling hash values to detect frequent ones
  • Partitioning build-side row set for efficient processing
  • Distributing partitioned row set to multiple servers

Potential Applications: This technology can be applied in database management systems, data processing platforms, and distributed computing environments where join operations are common.

Problems Solved:

  • Addressing build-side skew in join operations
  • Improving efficiency and performance of hash join build instances
  • Enhancing scalability of distributed computing systems

Benefits:

  • Increased efficiency in handling skewed data
  • Improved performance in join operations
  • Scalability in distributed computing environments

Commercial Applications: Optimizing data processing in large-scale databases, improving query performance in data analytics platforms, and enhancing scalability in cloud computing environments.

Prior Art: Researchers can explore prior patents related to hash join operations, distributed computing, and database optimization to understand the existing technology landscape.

Frequently Updated Research: Stay updated on advancements in distributed computing, database optimization, and data processing technologies to leverage the latest innovations in handling build-side skew.

Questions about Handling Build-Side Skew: 1. How does partitioning a build-side row set using frequent hash values improve join operation efficiency? 2. What are the potential challenges in distributing partitioned build-side row sets to multiple servers for processing?


Original Abstract Submitted

Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.