17457665. IMPUTING MACHINE LEARNING TRAINING DATA simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)
IMPUTING MACHINE LEARNING TRAINING DATA
Organization Name
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor(s)
IMPUTING MACHINE LEARNING TRAINING DATA - A simplified explanation of the abstract
This abstract first appeared for US patent application 17457665 titled 'IMPUTING MACHINE LEARNING TRAINING DATA
Simplified Explanation
The patent application describes a method for imputing missing values in a dataset using a cluster model and linear regression.
- The method involves creating a correlation list of predictors with missing values.
- A cluster model is then generated based on a target value and predictor values.
- The method determines an imputed value for a missing value in a row of training data by using a linear regression model.
- The linear regression model uses multiple non-missing value predictor values for the clusters.
Potential applications of this technology:
- Data analysis and prediction models that rely on complete datasets can benefit from this method.
- It can be used in various fields such as finance, healthcare, and marketing where missing data is common.
Problems solved by this technology:
- Missing data can often lead to biased or inaccurate results in data analysis and prediction models.
- This method helps to address the issue of missing values by providing imputed values based on the available data.
Benefits of this technology:
- The method allows for more accurate and reliable analysis and prediction models by imputing missing values.
- It reduces the need for manual data imputation, saving time and effort.
- The cluster model and linear regression approach provide a systematic and efficient way to handle missing values in datasets.
Original Abstract Submitted
Embodiments are disclosed for a method. The method includes determining a correlation list of missing value predictors. The method also includes generating a cluster model having multiple clusters. The cluster model is based on a target value and predictor values. The method further includes determining an imputed value for a missing value of a row of original training data based on a linear regression model for multiple non-missing value predictor values for the clusters.