DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING

Organization Name

Inventor(s)

DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING - A simplified explanation of the abstract

This abstract first appeared for US patent application 18400632 titled 'DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING

Simplified Explanation

The abstract describes a patent application for a system that uses a feature extraction tool to analyze a sensitive dataset and generate feature description data. This data is then used by a second computing system, which does not have access to the original dataset, to create a synthetic dataset with the same features. The second system trains a machine learning model with this synthetic data, which is then used by the first system for analysis of the original dataset.

The first computing system analyzes a sensitive dataset using a feature extraction tool to generate feature description data.
The second computing system, without access to the original dataset, uses a data synthesizer to create a synthetic dataset based on the feature description data.
The second system trains a machine learning model with the synthetic dataset and provides it to the first system for analysis of the original dataset.

Potential Applications

This technology could be applied in various fields such as healthcare, finance, and marketing for analyzing sensitive data without compromising privacy.

Problems Solved

This technology addresses the challenge of analyzing sensitive datasets while maintaining data privacy and security.

Benefits

The system allows for the analysis of sensitive data without exposing the actual dataset, ensuring privacy and security.

Potential Commercial Applications

Potential commercial applications include data analysis services for industries that deal with sensitive information, such as healthcare and finance.

Possible Prior Art

One possible prior art for this technology could be the use of synthetic data generation techniques in machine learning to address privacy concerns in data analysis.

What are the potential limitations of this technology in real-world applications?

One potential limitation of this technology in real-world applications could be the accuracy of the synthetic dataset in representing the original dataset, which could impact the performance of the machine learning model.

How scalable is this technology for large datasets and complex features?

The scalability of this technology for large datasets and complex features may depend on the efficiency of the feature extraction tool and data synthesizer used in the system. Further research and development may be needed to optimize the scalability of the technology.

Original Abstract Submitted

A first computing system includes a data store with a sensitive dataset. The first computing system uses a feature extraction tool to perform a statistical analysis of the dataset to generate feature description data to describe a set of features within the dataset. A second computing system is coupled to the first computing system and does not have access to the dataset. The second computing system uses a data synthesizer to receive the feature description data and generate a synthetic dataset that models the dataset and includes the set of features. The second computing system trains a machine learning model with the synthetic data set and provides the trained machine learning model to the first computing system for use with data from the data store as an input.

18400632. DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING simplified abstract (Intel Corporation)

Contents

DATA PRIVACY PRESERVATION IN MACHINE LEARNING TRAINING

Organization Name

Inventor(s)