US Patent Application 17727647. PARTITIONING TIME SERIES DATA USING CATEGORY CARDINALITY simplified abstract

From WikiPatents
Jump to navigation Jump to search

PARTITIONING TIME SERIES DATA USING CATEGORY CARDINALITY

Organization Name

Microsoft Technology Licensing, LLC


Inventor(s)

Nazmiye Ceren Abay of Kirkland WA (US)


Nikolay Sergeyevich Rovinskiy of Redmond WA (US)


Vladimir Bejan of Redmond WA (US)


Eric T. Wright of Redmond WA (US)


Jia Liu of Clyde Hill WA (US)


Neil Arturo Tenenholtz of Cambridge MA (US)


Vijaykumar K. Aski of Bellevue WA (US)


Daniel Harrison Holstein of Union City CA (US)


PARTITIONING TIME SERIES DATA USING CATEGORY CARDINALITY - A simplified explanation of the abstract

  • This abstract for appeared for US patent application number 17727647 Titled 'PARTITIONING TIME SERIES DATA USING CATEGORY CARDINALITY'

Simplified Explanation

This abstract describes a method for dividing time series data into subsets without any duplicate time index values. The data is categorized and a probabilistic method is used to estimate the number of unique values in each category. A category is then selected based on its estimated cardinality value. A time series identifier is created using the selected category, and the data is partitioned into subsets based on this identifier. These subsets can be used to train machine learning models.


Original Abstract Submitted

The disclosure herein describes using probabilistic cardinality generation to partition time series data into subsets without entries that have duplicate time index values. Time series data including a plurality of categories and a time index category is obtained. Cardinality estimate values of the categories are generated using a probabilistic cardinality estimator and a candidate category is selected based on the cardinality estimate value of the selected candidate category. A time series identifier is generated using the candidate category and, based on the cardinality estimate value of the time series identifier indicating that subsets of the time series data partitioned based on the time series identifier lack entries with duplicate time index values, the time series data is partitioned into a set of time series grain data sets. The time series grain data sets can be used to train models using machine learning techniques.