US Patent Application 17728045. DATA SENSITIVITY ESTIMATION simplified abstract

From WikiPatents
Jump to navigation Jump to search

DATA SENSITIVITY ESTIMATION

Organization Name

Microsoft Technology Licensing, LLC


Inventor(s)

David Trigano of Or Akiva (IL)


Andrey Karpovsky of Kiryat Motzkin (IL)


Basel Shaheen of Haifa (IL)


DATA SENSITIVITY ESTIMATION - A simplified explanation of the abstract

  • This abstract for appeared for US patent application number 17728045 Titled 'DATA SENSITIVITY ESTIMATION'

Simplified Explanation

The disclosed technology is about data classification, specifically identifying sensitive data within a given dataset. It involves using training data and a ground truth to train a model using natural language processing. The model learns features, including a naming feature associated with data resource names. Using supervised learning, a heuristic or machine learning model is created based on the training data and ground truth. When input data is provided, the model calculates a data resource sensitivity estimator (DRSE) value for each part of the data, considering the combination of features. If the DRSE value indicates potential sensitivity, that portion of the input data is flagged as potentially sensitive.


Original Abstract Submitted

The disclosed technology is generally directed to data classification. In one example of the technology, training data and a ground truth that indicates sensitive data within the training data is received. Based at least on the training data, natural language processing is used to learn features. The features include a naming feature that is associated with names of data resources in the training data. Based at least on the training data and the ground truth, using supervised learning, a model that is a heuristic model and/or a machine learning model is created. Input data information that is associated with input data is received. The model is used to determine a data resource sensitivity estimator (DRSE) value for each portion of the input data. The determination is based on the combination of features for the input data. Potentially sensitive data within the input data is flagged based on the DRSE values.