US Patent Application 17739716. SYSTEM AND METHOD FOR EFFICIENT TRANSFORMATION PREDICTION IN A DATA ANALYTICS PREDICTION MODEL PIPELINE simplified abstract

From WikiPatents
Jump to navigation Jump to search

SYSTEM AND METHOD FOR EFFICIENT TRANSFORMATION PREDICTION IN A DATA ANALYTICS PREDICTION MODEL PIPELINE

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION


Inventor(s)

Dong Hai Yu of Xian (CN)

Jun Wang of Xi'an (CN)

Bo Song of XIAN (CN)

Yao Dong Liu of XIAN (CN)

Jiang Bo Kang of XIAN (CN)

Lei Tian of Xi'an (CN)

XING Wei of XI'AN (CN)

SYSTEM AND METHOD FOR EFFICIENT TRANSFORMATION PREDICTION IN A DATA ANALYTICS PREDICTION MODEL PIPELINE - A simplified explanation of the abstract

This abstract first appeared for US patent application 17739716 titled 'SYSTEM AND METHOD FOR EFFICIENT TRANSFORMATION PREDICTION IN A DATA ANALYTICS PREDICTION MODEL PIPELINE

Simplified Explanation

The patent application describes a computer system or program that improves the selection of transformations in an ensemble machine learning model.

  • The system provides all the base machine learning models in the ensemble model.
  • It identifies and analyzes derived fields in these models.
  • It computes the importance weights for both the derived fields and the models themselves.
  • The system clusters the derived fields based on their importance weights.
  • It sorts the clusters to find the best one based on the importance weights.
  • Finally, it runs the base machine learning models using the derived fields in the best cluster.


Original Abstract Submitted

A computer-implemented system, platform, programing product, and/or method for improving transformation selection in an ensemble machine learning (ML) model that includes: providing all base ML models of the ensemble ML model; identifying all of a plurality of Derived Fields in all the base ML models; performing a Derived Field run prediction analysis for all the Derived Fields; computing the Derived Field Importance Weight for Field (DFIW4F) and the Derived Field Importance Weight for Model (DFIW4M) for all the Derived Fields; clustering all the Derived Fields into a plurality of Derived Field clusters, wherein each Derived Field cluster is based upon the DFIW4M and the DFIW4F for the Derived Field; sorting all the Derived Field clusters by best cluster based upon DFIW4M and DFIW4F; and running the base ML models based upon the Derived Fields in the best Derived Field cluster until sufficient base ML models have been run.