18819649. BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION (Snowflake Inc.)

From WikiPatents
Revision as of 07:39, 19 December 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION

Organization Name

Snowflake Inc.

Inventor(s)

Xinzhu Cai of San Mateo CA (US)

Bowei Chen of San Bruno CA (US)

Bjoern Daase of Berlin (DE)

Moritz Eyssen of Berlin (DE)

Florian Andreas Funke of Berlin (DE)

BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION

This abstract first appeared for US patent application 18819649 titled 'BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION



Original Abstract Submitted

Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.