ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS

Organization Name

Inventor(s)

Rajesh Vellore Arumugam of Singapore (SG)

ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18070598 titled 'ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS

Simplified Explanation

The patent application describes methods, systems, and computer-readable storage media for processing query data and target data using an attention ML model and a sub-word-level tokenizer to generate explanations based on attention matrices.

Query data and target data are received, representing query entities and target entities.
An attention ML model determines a set of character-level embeddings.
A sub-word-level tokenizer provides a set of sub-word-level tokens, each including a string of multiple characters.
The attention ML model generates sub-word-level embeddings based on the tokens.
The attention ML model provides at least one attention matrix with attention scores representing the importance of sub-word-level tokens in predicted matches.
Explanations are outputted based on the attention matrix.

Potential Applications

Natural language processing
Information retrieval systems
Machine learning models

Problems Solved

Improving accuracy in matching query entities with target entities
Enhancing interpretability of machine learning models

Benefits

Increased precision in matching entities
Enhanced understanding of model predictions
Improved user trust in AI systems

Potential Commercial Applications

Enhanced Entity Matching in Information Retrieval Systems

Possible Prior Art

No prior art is known at this time.

Unanswered Questions

How does the attention ML model handle noisy or ambiguous data in the query and target entities?

The patent application does not specifically address how the system deals with noisy or ambiguous data that may affect the accuracy of entity matching.

What computational resources are required to implement this system effectively?

The patent application does not provide details on the computational resources needed to deploy and operate the described methods and systems.

Original Abstract Submitted

Methods, systems, and computer-readable storage media for receiving query data representative of query entities and target data representative of target entities, determining, by an attention ML model, a set of character-level embeddings, providing, by a sub-word-level tokenizer, a set of sub-word-level tokens, each sub-word-level token including a string of multiple characters, generating, by the attention ML model, a set of sub-word-level embeddings based on the set of sub-word-level tokens, providing, by the attention ML model, at least one attention matrix including attention scores, each attention score representative of a relative importance of a respective sub-word-level token in a predicted match, the predicted match including a match between a query entity and a target entity, and outputting an explanation based on the at least one attention matrix.

18070598. ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS simplified abstract (SAP SE)

Contents

ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS

Organization Name

Inventor(s)