SELECTING A HIGH COVERAGE DATASET

Organization Name

International Business Machines Corporation

Inventor(s)

Shaikh Shahriar Quader of Scarborough (CA)

Aindrila Basak of Edmonton (CA)

Adrian Mahjour of Toronto (CA)

Petr Novotny of Mount Kisco NY (US)

CARLO Appugliese of Seminole FL (US)

Berthold Reinwald of San Jose CA (US)

Dheeraj Arremsetty of Austin TX (US)

SELECTING A HIGH COVERAGE DATASET - A simplified explanation of the abstract

This abstract first appeared for US patent application 17893367 titled 'SELECTING A HIGH COVERAGE DATASET

Simplified Explanation

The patent application describes a method for generating a representative dataset from an initial dataset using a machine learning model.

Access dataset associated with a machine learning model
Receive input parameters for representative dataset selection
Determine density of datapoints in the dataset
Train a machine learning model using a selected data point based on density
Evaluate the model using a specified metric
Generate a representative subset based on the evaluation metric value
Provide the representative dataset and a final machine learning model trained using it

Potential Applications

Data preprocessing for machine learning models
Feature selection and dataset reduction
Improving model performance and efficiency

Problems Solved

Selecting a representative subset from a large dataset
Enhancing model training and evaluation process
Streamlining data processing for machine learning tasks

Benefits

Improved model accuracy and generalization
Reduced computational resources and time for training
Enhanced interpretability of machine learning models

Original Abstract Submitted

Providing a representative dataset from an initial dataset by accessing a dataset associated with a machine learning model, receiving input parameters associated with the representative dataset selection, the input parameters including an evaluation metric, determining a density of a plurality of datapoints associated with the dataset, training a first iteration of a machine learning model using a first data point selected according to the density, determining a first value of the evaluation metric for the first iteration of the machine learning model, generating a representative subset based on the first value of the evaluation metric value, and providing the representative dataset and a final machine learning model trained using the representative dataset.

17893367. SELECTING A HIGH COVERAGE DATASET simplified abstract (International Business Machines Corporation)

Contents

SELECTING A HIGH COVERAGE DATASET

Organization Name

Inventor(s)

SELECTING A HIGH COVERAGE DATASET - A simplified explanation of the abstract

Simplified Explanation

Potential Applications

Problems Solved

Benefits

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools