20250181650. Structured Output Duplic (SAS Institute .)
STRUCTURED OUTPUT OF DUPLICATE OR NEAR-DUPLICATE TEXT DOCUMENTS IDENTIFIED USING AUTOMATED NEAR-DUPLICATE DETECTION FOR TEXT DOCUMENTS
Abstract: techniques described herein provide for generation of structured output for documents identified using automated near-duplicate detection. in one example, a system can receive a set of documents including at least one pair of similar documents determined to be similar to one another based on similarity scores generated using a predefined similarity scoring technique. the system can generate document groups by merging together pairs of documents that share at least one document. the system can, for each of the document groups, identify a representative document for the document group. the system can generate an output for display including a section for each document group, in which each section includes the representative document for the document group and, for each document in the document group, the similarity score relative to the representative document for the document group.
Inventor(s): Fan WANG, Teresa S. JADE, Xu YANG
CPC Classification: G06F16/906 (Details of database functions independent of the retrieved data types)
Search for rejections for patent application number 20250181650