18071294. SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY simplified abstract (MASTERCARD INTERNATIONAL INCORPORATED)

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY

Organization Name

MASTERCARD INTERNATIONAL INCORPORATED

Inventor(s)

Tadeu Augusto Ferreira of Toronto (CA)

Ravi Santosh Arvapally of Telangana (IN)

SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY - A simplified explanation of the abstract

This abstract first appeared for US patent application 18071294 titled 'SYSTEMS AND METHODS OF JOINING DATA RECORDS AND DETECTING STRING SIMILARITY

Simplified Explanation

The disclosure pertains to methods and systems for joining data structures based on a composite similarity score (CSS). This involves using multiple similarity models to generate individual similarity scores, which are metrics indicating the confidence that two data values from different records are similar. The CSS is then calculated based on these sub-scores to determine the overall similarity between the records being compared. Additionally, a string similarity model is described that detects similarity among strings regardless of word order and tolerates errors or omissions in the strings.

  • The patent application introduces a method for joining data structures using a composite similarity score.
  • Multiple similarity models are utilized to calculate individual similarity scores between data values.
  • A composite similarity score is generated based on these sub-scores to determine overall record similarity.
  • A string similarity model is included to detect similarity among strings without regard to word order or errors.

Potential Applications

The technology described in this patent application could be applied in various fields such as:

  • Data integration and data matching in databases
  • Information retrieval systems
  • Record linkage and deduplication processes

Problems Solved

The innovation addresses the following issues:

  • Efficient comparison of data records across different structures
  • Detection of similarities in strings with errors or omissions
  • Improved accuracy in determining record similarity

Benefits

The benefits of this technology include:

  • Enhanced data quality and consistency
  • Streamlined data processing and analysis
  • Increased efficiency in data matching and deduplication tasks

Potential Commercial Applications

The technology could find commercial applications in:

  • Database management software
  • Data analytics platforms
  • Information retrieval tools

Possible Prior Art

One possible prior art in this field is the use of similarity metrics in data matching algorithms to compare and link records from different datasets.

Unanswered Questions

How does the string similarity model handle errors and omissions in the strings?

The string similarity model in the patent application is designed to tolerate errors and omissions in the strings being compared. However, the specific techniques or algorithms used to achieve this are not detailed in the abstract.

What types of similarity models are used to generate the individual similarity scores?

The abstract mentions the use of multiple similarity models to calculate the individual similarity scores. It would be helpful to know the specific types of models or algorithms employed in this process.


Original Abstract Submitted

The disclosure relates to methods and systems of joining data structures based on a composite similarity score (CSS). For example, a computer system may use a plurality of similarity models to generate respective similarity scores. Each similarity score may be a metric that indicates a confidence that a first data value of a first data record is similar to a second data value of a second data record. The computer system may generate the CSS based on the plurality of similarity sub-scores. The CSS may indicate a confidence that the records being compared are similar. Thus, the CSS may be used to detect similar data records across different data structures. The disclosure also relates to a string similarity model that detects similarity among strings without respect to an order of words in each string and in a way that tolerates errors or omissions in one or both strings.