17893367. SELECTING A HIGH COVERAGE DATASET simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

SELECTING A HIGH COVERAGE DATASET

Organization Name

International Business Machines Corporation

Inventor(s)

Shaikh Shahriar Quader of Scarborough (CA)

Aindrila Basak of Edmonton (CA)

Adrian Mahjour of Toronto (CA)

Petr Novotny of Mount Kisco NY (US)

CARLO Appugliese of Seminole FL (US)

Berthold Reinwald of San Jose CA (US)

Dheeraj Arremsetty of Austin TX (US)

SELECTING A HIGH COVERAGE DATASET - A simplified explanation of the abstract

This abstract first appeared for US patent application 17893367 titled 'SELECTING A HIGH COVERAGE DATASET

Simplified Explanation

The patent application describes a method for generating a representative dataset from an initial dataset using a machine learning model.

  • Access dataset associated with a machine learning model
  • Receive input parameters for representative dataset selection
  • Determine density of datapoints in the dataset
  • Train a machine learning model using a selected data point based on density
  • Evaluate the model using a specified metric
  • Generate a representative subset based on the evaluation metric value
  • Provide the representative dataset and a final machine learning model trained using it

Potential Applications

  • Data preprocessing for machine learning models
  • Feature selection and dataset reduction
  • Improving model performance and efficiency

Problems Solved

  • Selecting a representative subset from a large dataset
  • Enhancing model training and evaluation process
  • Streamlining data processing for machine learning tasks

Benefits

  • Improved model accuracy and generalization
  • Reduced computational resources and time for training
  • Enhanced interpretability of machine learning models


Original Abstract Submitted

Providing a representative dataset from an initial dataset by accessing a dataset associated with a machine learning model, receiving input parameters associated with the representative dataset selection, the input parameters including an evaluation metric, determining a density of a plurality of datapoints associated with the dataset, training a first iteration of a machine learning model using a first data point selected according to the density, determining a first value of the evaluation metric for the first iteration of the machine learning model, generating a representative subset based on the first value of the evaluation metric value, and providing the representative dataset and a final machine learning model trained using the representative dataset.