BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

Organization Name

Inventor(s)

Florian Andreas Funke of San Francisco CA (US)

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18047872 titled 'BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

Simplified Explanation: The patent application describes systems and methods for handling build-side skew in a join operation by partitioning a build-side row set using frequent hash values.

Detect frequent hash values in a join operation
Partition build-side row set using frequent hash values
Distribute partitioned build-side row set to multiple servers for processing

Key Features and Innovation:

Computing hash values for a join operation
Sampling hash values to detect frequent ones
Partitioning build-side row set for efficient processing
Distributing partitioned row set to multiple servers

Potential Applications: This technology can be applied in database management systems, data processing platforms, and distributed computing environments where join operations are common.

Problems Solved:

Addressing build-side skew in join operations
Improving efficiency and performance of hash join build instances
Enhancing scalability of distributed computing systems

Benefits:

Increased efficiency in handling skewed data
Improved performance in join operations
Scalability in distributed computing environments

Commercial Applications: Optimizing data processing in large-scale databases, improving query performance in data analytics platforms, and enhancing scalability in cloud computing environments.

Prior Art: Researchers can explore prior patents related to hash join operations, distributed computing, and database optimization to understand the existing technology landscape.

Frequently Updated Research: Stay updated on advancements in distributed computing, database optimization, and data processing technologies to leverage the latest innovations in handling build-side skew.

Questions about Handling Build-Side Skew: 1. How does partitioning a build-side row set using frequent hash values improve join operation efficiency? 2. What are the potential challenges in distributing partitioned build-side row sets to multiple servers for processing?

Original Abstract Submitted

Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.

18047872. BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS simplified abstract (Snowflake Inc.)

Contents

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

Organization Name

Inventor(s)

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools