17849439. CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION simplified abstract (Capital One Services, LLC)

From WikiPatents
Jump to navigation Jump to search

CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION

Organization Name

Capital One Services, LLC

Inventor(s)

Anh Truong of Champaign IL (US)

Jeremy Goodsitt of Champaign IL (US)

Austin Walters of Savoy IL (US)

CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17849439 titled 'CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION

Simplified Explanation

The patent application describes a method for associating data labels with tokens in a text sequence by matching patterns.

  • The method involves generating patterns to indicate a data label and associating a candidate token with the data label.
  • The candidate token is selected from the text sequence based on a match with the patterns.
  • The method also includes updating the patterns with new matches and removing patterns that match with certain token sequences.
  • The token sequence collection is updated to include the candidate token and a context token.

Potential applications of this technology:

  • Text classification and labeling in natural language processing tasks.
  • Information extraction from unstructured text data.
  • Sentiment analysis and opinion mining in social media or customer reviews.

Problems solved by this technology:

  • Efficient and automated labeling of tokens in a text sequence.
  • Handling large volumes of unstructured text data for analysis.
  • Improving accuracy and consistency in text classification tasks.

Benefits of this technology:

  • Streamlined and automated process for labeling tokens in text data.
  • Increased efficiency and productivity in text analysis tasks.
  • Improved accuracy and consistency in data labeling and classification.


Original Abstract Submitted

A method includes generating first patterns indicating a data label and associating a candidate token of a text sequence with the data label by removing first tokens from the text sequence based on a match of the first tokens with a token of second patterns and selecting the candidate token from other tokens of the text sequence based on a match between the candidate token and a token of the second patterns. The method also includes updating a token sequence collection to comprise the candidate token and a context token, updating the second patterns with new patterns that match the candidate token and the context token, and removing a first pattern from the second patterns based on a determination that the first pattern matches with a token sequence associated with the test tokens.