International business machines corporation (20240104423). Capturing Data Properties to Recommend Machine Learning Models for Datasets simplified abstract

From WikiPatents
Jump to navigation Jump to search

Capturing Data Properties to Recommend Machine Learning Models for Datasets

Organization Name

international business machines corporation

Inventor(s)

Manjit Singh Sodhi of Bangalore (IN)

Suja Mohandas of Palakkad (IN)

Nitin Gupta of Saharanpur (IN)

Kalapriya Kannan of Bangalore (IN)

Prerna Agarwal of New Delhi (IN)

Capturing Data Properties to Recommend Machine Learning Models for Datasets - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240104423 titled 'Capturing Data Properties to Recommend Machine Learning Models for Datasets

Simplified Explanation

The patent application describes a method for recommending machine learning models based on similarity scores calculated between new datasets and existing models in a model catalog. The closest match machine learning model is identified for the new dataset, and predictions are generated if the similarity threshold is exceeded.

  • Training machine learning models with unique datasets
  • Extracting metadata with dataset properties
  • Storing machine learning models and metadata in a model catalog
  • Calculating similarity scores between new datasets and existing models
  • Identifying closest match machine learning model for new dataset
  • Generating predictions with closest match model if similarity threshold is exceeded

Potential Applications

This technology could be applied in various fields such as healthcare, finance, and e-commerce for recommending the most suitable machine learning models for specific datasets.

Problems Solved

This technology helps in automating the process of selecting the appropriate machine learning model for a given dataset, saving time and resources for data scientists and researchers.

Benefits

The method improves the efficiency and accuracy of machine learning model selection, leading to better predictions and insights from the data.

Potential Commercial Applications

"Optimizing Machine Learning Model Selection for Improved Predictions"

Possible Prior Art

One possible prior art could be a similar method for recommending machine learning models based on dataset properties, but with different algorithms or techniques.

=== What are the specific properties used to calculate similarity scores between datasets and machine learning models? The specific properties used to calculate similarity scores between datasets and machine learning models are extracted metadata that includes properties of the datasets such as size, structure, and features.

=== How is the similarity threshold determined for identifying the closest match machine learning model? The similarity threshold for identifying the closest match machine learning model is determined based on the level of similarity required for the new dataset to be considered a good fit for the existing model. This threshold can be set based on empirical testing and validation with different datasets and models.


Original Abstract Submitted

recommending machine learning models is provided. the method comprises training machine learning models, wherein each machine learning model is trained with a unique respective dataset. metadata associated with each machine learning model is extracted, wherein the metadata includes properties of the respective dataset used to train the machine learning model. the machine learning models and metadata are stored in a model catalog. upon receiving a new dataset, similarity scores are calculated between the new dataset and the machine learning models in the model catalog according to the properties of the datasets in the metadata of the machine learning models. a closest match machine learning model is identified from the model catalog for the new dataset according to similarity score. responsive to a determination that the closest match machine learning model exceeds a similarity threshold, predictions for the new dataset are generated with the closest match machine learning model.