18819649. BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION (Snowflake Inc.)
Contents
BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION
Organization Name
Inventor(s)
Xinzhu Cai of San Mateo CA (US)
Bowei Chen of San Bruno CA (US)
Florian Andreas Funke of Berlin (DE)
BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION
This abstract first appeared for US patent application 18819649 titled 'BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION
Original Abstract Submitted
Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.