18530960. DIVERSITY-AWARE WEIGHTED MAJORITY VOTE CLASSIFIER FOR IMBALANCED DATASETS simplified abstract (NEC Corporation)

From WikiPatents
Jump to navigation Jump to search

DIVERSITY-AWARE WEIGHTED MAJORITY VOTE CLASSIFIER FOR IMBALANCED DATASETS

Organization Name

NEC Corporation

Inventor(s)

Anil Goyal of Heidelberg (DE)

Jihed Khiari of Heidelberg (DE)

DIVERSITY-AWARE WEIGHTED MAJORITY VOTE CLASSIFIER FOR IMBALANCED DATASETS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18530960 titled 'DIVERSITY-AWARE WEIGHTED MAJORITY VOTE CLASSIFIER FOR IMBALANCED DATASETS

Simplified Explanation

The abstract describes a method for binary classification on an imbalanced dataset using ensemble learning techniques.

  • Generatively oversampling the imbalanced dataset to create a generated dataset.
  • Learning base classifiers on subsamples of the generated dataset.
  • Combining outputs of the base classifiers to create a weighted majority vote classifier.

Potential Applications

This technology can be applied in various fields such as fraud detection, medical diagnosis, and anomaly detection where imbalanced datasets are common.

Problems Solved

This technology addresses the issue of imbalanced datasets where the minority class is underrepresented, leading to biased classification results.

Benefits

The method improves the classification performance on imbalanced datasets by generating synthetic examples of the minority class and combining the predictions of multiple base classifiers.

Potential Commercial Applications

Potential commercial applications include financial fraud detection systems, medical diagnostic tools, and cybersecurity systems that require accurate classification of imbalanced data.

Possible Prior Art

One possible prior art in this field is the SMOTE (Synthetic Minority Over-sampling Technique) algorithm, which also addresses the issue of imbalanced datasets by generating synthetic examples of the minority class.

What are the specific techniques used for generatively oversampling the imbalanced dataset in this method?

The specific technique used in this method is to synthetically generate minority class examples to balance the dataset and improve classification performance.

How are the weights assigned to the base classifiers in order to minimize diversity on the positive samples?

The weights are assigned to the base classifiers in such a way that the diversity between the base classifiers on the positive samples is minimized, ensuring that the ensemble model gives more weight to classifiers that perform well on the minority class.


Original Abstract Submitted

An ensemble learning based method is for a binary classification on an imbalanced dataset. The imbalanced dataset has a minority class comprising positive samples and a majority class comprising negative samples. The method includes: generatively oversampling the imbalanced dataset by synthetically generating minority class examples, thereby generating a generated dataset; using the generated dataset to generate subsamples, and learning a base classifier on each of the subsamples to determine a plurality of base classifiers; and learning a weighted majority vote classifier by combining outputs of the base classifiers. Each of the base classifiers is assigned a weight in such a way that a diversity between the base classifiers on the positive samples is minimized.