18474399. CACHE-EFFICIENT TOP-K AGGREGATION OVER HIGH CARDINALITY LARGE DATASETS (Microsoft Technology Licensing, LLC)
CACHE-EFFICIENT TOP-K AGGREGATION OVER HIGH CARDINALITY LARGE DATASETS
Organization Name
Microsoft Technology Licensing, LLC
Inventor(s)
Tarique Ashraf Siddiqui of Redmond WA US
Vivek Ravindranath Narasayya of Redmond WA US
Marius Dumitru of Issaquah WA US
Surajit Chaudhuri of Kirkland WA US
CACHE-EFFICIENT TOP-K AGGREGATION OVER HIGH CARDINALITY LARGE DATASETS
This abstract first appeared for US patent application 18474399 titled 'CACHE-EFFICIENT TOP-K AGGREGATION OVER HIGH CARDINALITY LARGE DATASETS
Original Abstract Submitted
A data processing system implements a cache-conscious aggregation framework for cache-efficient top-k aggregation over high cardinality large datasets. The framework leverages skew in the distribution of data in the datasets to minimize data movements within the local caches of the cores of the multicore processors of the data processing system. The framework performs representative sampling on the dataset and utilizes these samples to identify candidate groups in the dataset for the top-k results. The system performs exact aggregations for the candidate groups and performs hashing and pruning on the non-candidate groups in the dataset to identify top-k results included in the non-candidate groups without having to calculate the exact aggregations for the non-candidate groups.