Intel corporation (20240135209). DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING simplified abstract

From WikiPatents
Jump to navigation Jump to search

DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING

Organization Name

intel corporation

Inventor(s)

Priyanka Mudgal of Portland OR (US)

Rita H. Wouhaybi of Portland OR (US)

DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240135209 titled 'DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING

Simplified Explanation

The patent application describes a system where a first computing system analyzes a sensitive dataset to generate feature description data, which is then used by a second computing system to create a synthetic dataset for training a machine learning model.

  • The first computing system has a data store with a sensitive dataset.
  • The first computing system uses a feature extraction tool to analyze the dataset and generate feature description data.
  • The second computing system, which does not have access to the dataset, uses a data synthesizer to create a synthetic dataset based on the feature description data.
  • The second computing system trains a machine learning model with the synthetic dataset.
  • The trained machine learning model is provided back to the first computing system for use with the original dataset.

Potential Applications

This technology can be applied in various fields such as healthcare, finance, and marketing for data analysis and model training.

Problems Solved

This technology helps in protecting sensitive data while still allowing for the training of machine learning models on that data.

Benefits

The system provides a way to leverage sensitive datasets for machine learning model training without compromising data privacy.

Potential Commercial Applications

One potential commercial application of this technology is in the healthcare industry for analyzing patient data while maintaining privacy and compliance with regulations.

Possible Prior Art

One possible prior art could be systems that use synthetic data for training machine learning models to protect sensitive information.

Unanswered Questions

How does the system ensure the privacy and security of the sensitive dataset during the feature extraction process?

The system must have robust security measures in place to prevent unauthorized access to the sensitive dataset during feature extraction.

What are the limitations of using synthetic data for training machine learning models compared to real data?

Synthetic data may not fully capture the complexity and nuances of real-world data, potentially leading to less accurate machine learning models.


Original Abstract Submitted

a first computing system includes a data store with a sensitive dataset. the first computing system uses a feature extraction tool to perform a statistical analysis of the dataset to generate feature description data to describe a set of features within the dataset. a second computing system is coupled to the first computing system and does not have access to the dataset. the second computing system uses a data synthesizer to receive the feature description data and generate a synthetic dataset that models the dataset and includes the set of features. the second computing system trains a machine learning model with the synthetic data set and provides the trained machine learning model to the first computing system for use with data from the data store as an input.