18071294. SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY simplified abstract (MASTERCARD INTERNATIONAL INCORPORATED)
Contents
- 1 SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 Original Abstract Submitted
SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY
Organization Name
MASTERCARD INTERNATIONAL INCORPORATED
Inventor(s)
Tadeu Augusto Ferreira of Toronto (CA)
Ravi Santosh Arvapally of Telangana (IN)
SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY - A simplified explanation of the abstract
This abstract first appeared for US patent application 18071294 titled 'SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY
Simplified Explanation
The disclosure pertains to methods and systems for joining data structures based on a composite similarity score (CSS). This involves using multiple similarity models to generate individual similarity scores, which are metrics indicating the confidence that two data values from different records are similar. The CSS is then calculated based on these sub-scores to determine the overall similarity between the records being compared. Additionally, a string similarity model is described that detects similarity among strings regardless of word order and tolerates errors or omissions in the strings.
- The patent application introduces a method for joining data structures using a composite similarity score.
- Multiple similarity models are utilized to calculate individual similarity scores between data values.
- A composite similarity score is generated based on these sub-scores to determine overall record similarity.
- A string similarity model is included to detect similarity among strings without regard to word order or errors.
Potential Applications
The technology described in this patent application could be applied in various fields such as:
- Data integration and data matching in databases
- Information retrieval systems
- Record linkage and deduplication processes
Problems Solved
The innovation addresses the following issues:
- Efficient comparison of data records across different structures
- Detection of similarities in strings with errors or omissions
- Improved accuracy in determining record similarity
Benefits
The benefits of this technology include:
- Enhanced data quality and consistency
- Streamlined data processing and analysis
- Increased efficiency in data matching and deduplication tasks
Potential Commercial Applications
The technology could find commercial applications in:
- Database management software
- Data analytics platforms
- Information retrieval tools
Possible Prior Art
One possible prior art in this field is the use of similarity metrics in data matching algorithms to compare and link records from different datasets.
Unanswered Questions
How does the string similarity model handle errors and omissions in the strings?
The string similarity model in the patent application is designed to tolerate errors and omissions in the strings being compared. However, the specific techniques or algorithms used to achieve this are not detailed in the abstract.
What types of similarity models are used to generate the individual similarity scores?
The abstract mentions the use of multiple similarity models to calculate the individual similarity scores. It would be helpful to know the specific types of models or algorithms employed in this process.
Original Abstract Submitted
The disclosure relates to methods and systems of joining data structures based on a composite similarity score (CSS). For example, a computer system may use a plurality of similarity models to generate respective similarity scores. Each similarity score may be a metric that indicates a confidence that a first data value of a first data record is similar to a second data value of a second data record. The computer system may generate the CSS based on the plurality of similarity sub-scores. The CSS may indicate a confidence that the records being compared are similar. Thus, the CSS may be used to detect similar data records across different data structures. The disclosure also relates to a string similarity model that detects similarity among strings without respect to an order of words in each string and in a way that tolerates errors or omissions in one or both strings.