US Patent Application 17804055. SYSTEM AND METHOD FOR INTEGRATING MACHINE LEARNING IN DATA LEAKAGE DETECTION SOLUTION THROUGH KEYWORD POLICY PREDICTION simplified abstract

From WikiPatents
Jump to navigation Jump to search

SYSTEM AND METHOD FOR INTEGRATING MACHINE LEARNING IN DATA LEAKAGE DETECTION SOLUTION THROUGH KEYWORD POLICY PREDICTION

Organization Name

SAUDI ARABIAN OIL COMPANY

Inventor(s)

Ahmad F. Sirhani of Dammam (SA)

Abdullah K. Madani of Dhahran (SA)

Abdulrahman M. Alomar of Al Hasa (SA)

SYSTEM AND METHOD FOR INTEGRATING MACHINE LEARNING IN DATA LEAKAGE DETECTION SOLUTION THROUGH KEYWORD POLICY PREDICTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17804055 titled 'SYSTEM AND METHOD FOR INTEGRATING MACHINE LEARNING IN DATA LEAKAGE DETECTION SOLUTION THROUGH KEYWORD POLICY PREDICTION

Simplified Explanation

The patent application describes a method for improving a data leakage prevention system by suggesting keywords for policy creation. Here are the key points:

  • The method involves receiving a collection of labeled documents that have been filtered according to various criteria.
  • The documents are then parsed and converted into vectors, which represent the documents in a numerical format.
  • A machine-learned model is trained using a portion of the vectorized documents, enabling the system to learn patterns and make predictions.
  • The trained model is used to extract word importances, indicating which words are most relevant for identifying sensitive information.
  • Words that meet a certain criterion are retained as suggested keywords for the data leakage prevention system.
  • These suggested keywords are then incorporated into the policy of the system, helping to improve its ability to detect and prevent data leaks.


Original Abstract Submitted

A method which includes receiving a corpus of labelled documents according to a plurality of filters and parsing, by a computer processor, the corpus. The method further includes vectorizing, by the computer processor, the parsed corpus to obtain vectorized documents; and training, by the computer processor, a machine-learned model using, at least a portion, of the vectorized documents. The method further includes extracting word importances from the trained machine-learned model and retaining the words with associated importances that satisfy a criterion, wherein the retained words are suggested keywords. The method further includes incorporating the suggested keywords in a policy of a data leakage prevention system.