17849439. CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION simplified abstract (Capital One Services, LLC)
Contents
CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION
Organization Name
Inventor(s)
Anh Truong of Champaign IL (US)
Jeremy Goodsitt of Champaign IL (US)
Austin Walters of Savoy IL (US)
CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION - A simplified explanation of the abstract
This abstract first appeared for US patent application 17849439 titled 'CONTEXT-BASED PATTERN MATCHING FOR SENSITIVE DATA DETECTION
Simplified Explanation
The patent application describes a method for associating data labels with tokens in a text sequence by matching patterns.
- The method involves generating patterns to indicate a data label and associating a candidate token with the data label.
- The candidate token is selected from the text sequence based on a match with the patterns.
- The method also includes updating the patterns with new matches and removing patterns that match with certain token sequences.
- The token sequence collection is updated to include the candidate token and a context token.
Potential applications of this technology:
- Text classification and labeling in natural language processing tasks.
- Information extraction from unstructured text data.
- Sentiment analysis and opinion mining in social media or customer reviews.
Problems solved by this technology:
- Efficient and automated labeling of tokens in a text sequence.
- Handling large volumes of unstructured text data for analysis.
- Improving accuracy and consistency in text classification tasks.
Benefits of this technology:
- Streamlined and automated process for labeling tokens in text data.
- Increased efficiency and productivity in text analysis tasks.
- Improved accuracy and consistency in data labeling and classification.
Original Abstract Submitted
A method includes generating first patterns indicating a data label and associating a candidate token of a text sequence with the data label by removing first tokens from the text sequence based on a match of the first tokens with a token of second patterns and selecting the candidate token from other tokens of the text sequence based on a match between the candidate token and a token of the second patterns. The method also includes updating a token sequence collection to comprise the candidate token and a context token, updating the second patterns with new patterns that match the candidate token and the context token, and removing a first pattern from the second patterns based on a determination that the first pattern matches with a token sequence associated with the test tokens.