17893367. SELECTING A HIGH COVERAGE DATASET simplified abstract (International Business Machines Corporation)
Contents
SELECTING A HIGH COVERAGE DATASET
Organization Name
International Business Machines Corporation
Inventor(s)
Shaikh Shahriar Quader of Scarborough (CA)
Aindrila Basak of Edmonton (CA)
Adrian Mahjour of Toronto (CA)
Petr Novotny of Mount Kisco NY (US)
CARLO Appugliese of Seminole FL (US)
Berthold Reinwald of San Jose CA (US)
Dheeraj Arremsetty of Austin TX (US)
SELECTING A HIGH COVERAGE DATASET - A simplified explanation of the abstract
This abstract first appeared for US patent application 17893367 titled 'SELECTING A HIGH COVERAGE DATASET
Simplified Explanation
The patent application describes a method for generating a representative dataset from an initial dataset using a machine learning model.
- Access dataset associated with a machine learning model
- Receive input parameters for representative dataset selection
- Determine density of datapoints in the dataset
- Train a machine learning model using a selected data point based on density
- Evaluate the model using a specified metric
- Generate a representative subset based on the evaluation metric value
- Provide the representative dataset and a final machine learning model trained using it
Potential Applications
- Data preprocessing for machine learning models
- Feature selection and dataset reduction
- Improving model performance and efficiency
Problems Solved
- Selecting a representative subset from a large dataset
- Enhancing model training and evaluation process
- Streamlining data processing for machine learning tasks
Benefits
- Improved model accuracy and generalization
- Reduced computational resources and time for training
- Enhanced interpretability of machine learning models
Original Abstract Submitted
Providing a representative dataset from an initial dataset by accessing a dataset associated with a machine learning model, receiving input parameters associated with the representative dataset selection, the input parameters including an evaluation metric, determining a density of a plurality of datapoints associated with the dataset, training a first iteration of a machine learning model using a first data point selected according to the density, determining a first value of the evaluation metric for the first iteration of the machine learning model, generating a representative subset based on the first value of the evaluation metric value, and providing the representative dataset and a final machine learning model trained using the representative dataset.
- International Business Machines Corporation
- Shaikh Shahriar Quader of Scarborough (CA)
- Aindrila Basak of Edmonton (CA)
- Adrian Mahjour of Toronto (CA)
- Petr Novotny of Mount Kisco NY (US)
- CARLO Appugliese of Seminole FL (US)
- Berthold Reinwald of San Jose CA (US)
- Dheeraj Arremsetty of Austin TX (US)
- G06N20/00
- G06K9/62