SYSTEM AND METHOD FOR IMPROVING MACHINE LEARNING MODELS BY DETECTING AND REMOVING INACCURATE TRAINING DATA

Organization Name

microsoft technology licensing, llc

Inventor(s)

Oren Elisha of Hertzeliya (IL)

Ami Luttwak of Binyamina (IL)

Hila Yehuda of Tel Aviv (IL)

Adar Kahana of Natanya (IL)

Maya Bechler-speicher of Tel Aviv (IL)

SYSTEM AND METHOD FOR IMPROVING MACHINE LEARNING MODELS BY DETECTING AND REMOVING INACCURATE TRAINING DATA - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240202591 titled 'SYSTEM AND METHOD FOR IMPROVING MACHINE LEARNING MODELS BY DETECTING AND REMOVING INACCURATE TRAINING DATA

The patent application describes methods, systems, and computer program products to enhance machine learning model-based classification by identifying and removing inaccurate training data.

Identification and removal of inaccurate training samples based on excessive variance in vector space and discrepancies between assigned and predicted categories.
Selective removal of suspect or erroneous samples using criteria such as vector space variance and prediction confidence level.
Improvement of ML model accuracy by training on a more accurate revised training set.
Enhancement of ML model accuracy by identifying and removing suspect categories with excessive vector space variance.
User control over prediction confidence level and coverage to manage accuracy.

- Potential Applications:**

- Enhancing the accuracy of machine learning models in various industries such as finance, healthcare, and e-commerce. - Improving the efficiency of data classification processes in large datasets. - Enhancing the performance of recommendation systems and predictive analytics tools.

- Problems Solved:**

- Addressing the issue of inaccurate training data impacting the accuracy of machine learning models. - Providing a systematic approach to identify and remove suspect or erroneous training samples. - Improving the overall reliability and precision of machine learning algorithms.

- Benefits:**

- Increased accuracy and reliability of machine learning models. - Enhanced decision-making capabilities based on more precise data classification. - Improved performance of AI-driven systems in various applications.

- Commercial Applications:**

Title: "Enhancing Machine Learning Model Accuracy through Data Cleaning" This technology can be utilized in industries such as finance for fraud detection, healthcare for disease diagnosis, and e-commerce for personalized recommendations. The market implications include improved customer satisfaction, reduced errors, and increased operational efficiency.

- Questions about the Technology:**

1. How does this technology compare to traditional data cleaning methods?

  - This technology offers a more automated and systematic approach to identifying and removing inaccurate training data, leading to improved machine learning model accuracy.

2. What are the potential challenges in implementing this technology in real-world applications?

  - Some challenges may include the need for robust data preprocessing pipelines and ensuring the scalability of the system for large datasets.

Original Abstract Submitted

methods, systems and computer program products are described to improve machine learning (ml) model-based classification of data items by identifying and removing inaccurate training data. inaccurate training samples may be identified, for example, based on excessive variance in vector space between a training sample and a mean of category training samples, and based on a variance between an assigned category and a predicted category for a training sample. suspect or erroneous samples may be selectively removed based on, for example, vector space variance and/or prediction confidence level. as a result, ml model accuracy may be improved by training on a more accurate revised training set. ml model accuracy may (e.g., also) be improved, for example, by identifying and removing suspect categories with excessive (e.g., weighted) vector space variance. suspect categories may be retained or revised. users may (e.g., also) specify a prediction confidence level and/or coverage (e.g., to control accuracy).

Microsoft technology licensing, llc (20240202591). SYSTEM AND METHOD FOR IMPROVING MACHINE LEARNING MODELS BY DETECTING AND REMOVING INACCURATE TRAINING DATA simplified abstract

Contents

SYSTEM AND METHOD FOR IMPROVING MACHINE LEARNING MODELS BY DETECTING AND REMOVING INACCURATE TRAINING DATA

Organization Name

Inventor(s)

SYSTEM AND METHOD FOR IMPROVING MACHINE LEARNING MODELS BY DETECTING AND REMOVING INACCURATE TRAINING DATA - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools