International business machines corporation (20240185575). GENERATING BALANCED TRAIN-TEST SPLITS FOR MACHINE LEARNING simplified abstract

From WikiPatents
Jump to navigation Jump to search

GENERATING BALANCED TRAIN-TEST SPLITS FOR MACHINE LEARNING

Organization Name

international business machines corporation

Inventor(s)

Simona Rabinovici-cohen of Haifa (IL)

Ella Barkan

Tal Tlusty Shapiro of Zichron Yaacov (IL)

GENERATING BALANCED TRAIN-TEST SPLITS FOR MACHINE LEARNING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240185575 titled 'GENERATING BALANCED TRAIN-TEST SPLITS FOR MACHINE LEARNING

Simplified Explanation

The embodiment described in the abstract is a method for generating balanced train-test splits for machine learning analysis by automatically extracting features, determining impactful features, selecting subsets of features, clustering datasets, and generating train-test split versions.

  • Automatically extract low-level and high-level features from datasets
  • Automatically determine impactful features for each dataset
  • Automatically select subsets of impactful features
  • Automatically cluster datasets to generate clusters corresponding to selected feature subsets
  • Automatically generate train-test split versions using datasets from each cluster
  • Automatically score train-test split versions and select the highest-scoring version

Potential Applications

This technology can be applied in various fields such as healthcare, finance, marketing, and more for improving machine learning model training and testing processes.

Problems Solved

This technology solves the problem of imbalanced train-test splits in machine learning analysis, leading to more accurate and reliable model evaluation.

Benefits

The benefits of this technology include improved model performance, reduced bias in training and testing data, and increased efficiency in the machine learning workflow.

Potential Commercial Applications

Potential commercial applications of this technology include data analytics platforms, predictive modeling software, and machine learning services for industries requiring accurate and balanced model training and testing.

Possible Prior Art

One possible prior art for this technology could be the use of clustering algorithms in machine learning for data preprocessing and feature selection.

What are the limitations of this technology in real-world applications?

The limitations of this technology in real-world applications may include the complexity of feature extraction and selection processes, the computational resources required for clustering large datasets, and the potential bias introduced by the automatic selection of impactful features.

How does this technology compare to existing methods for generating train-test splits in machine learning?

This technology differs from existing methods by incorporating automatic feature extraction, impactful feature selection, and clustering techniques to generate balanced train-test splits, leading to more accurate model evaluation and improved performance.


Original Abstract Submitted

an embodiment for generating balanced train-test splits for machine learning analysis. the embodiment may automatically extract low-level features and high-level features from a series of received datasets. the embodiment may automatically determine a series of impactful features for each of the received datasets correlating to a corresponding label. the embodiment may automatically select subsets of impactful features the embodiment may automatically cluster the received datasets to generate series of clusters, each of the generated series of clusters corresponding to one of the selected subsets of impactful features. the embodiment may automatically generate train-test split versions using datasets from each cluster in each of the generated series of clusters. the embodiment may automatically score each of the generated train-test split versions and select a highest-scoring train-test split version.