17546346. PRIORITIZED DATA CLEANING simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

PRIORITIZED DATA CLEANING

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Ritwik Chaudhuri of Bangalore (IN)

Sameep Mehta of Bangalore (IN)

PRIORITIZED DATA CLEANING - A simplified explanation of the abstract

This abstract first appeared for US patent application 17546346 titled 'PRIORITIZED DATA CLEANING

Simplified Explanation

The patent application describes methods, systems, and computer program products for prioritized data cleaning. Here are the key points:

  • The method involves obtaining a dataset with multiple data issues.
  • The priority of different features in the dataset is determined.
  • A model is generated for each data resolution algorithm, indicating the computing costs of resolving data issues in the order of feature priority.
  • The data resolution algorithms are applied to resolve data issues based on the generated models.

Potential applications of this technology:

  • Data cleaning and data quality improvement in various industries such as finance, healthcare, e-commerce, etc.
  • Streamlining data processing and analysis workflows by automating the prioritization and resolution of data issues.

Problems solved by this technology:

  • Prioritizing data cleaning efforts based on the importance of different features in the dataset.
  • Optimizing the resolution of data issues by considering the computing costs of different data resolution algorithms.

Benefits of this technology:

  • Efficient and effective data cleaning process by resolving data issues in the order of feature priority.
  • Cost savings by minimizing computing costs through the use of appropriate data resolution algorithms.
  • Improved data quality and reliability for better decision-making and analysis.


Original Abstract Submitted

Methods, systems, and computer program products for prioritized data cleaning are provided herein. A computer-implemented method includes obtaining a dataset comprising a plurality of data issues; determining a priority of one or more features of the dataset; generating a respective model for each of a plurality of data resolution algorithms, wherein each model indicates computing costs of the corresponding data resolution algorithm for resolving at least portion of the plurality of data issues in an order of the priority of the features; and applying one or more of the plurality of data resolutions algorithm to resolve at least a portion of the data issues in the order of the priority of the features based at least in part on the generated models.