Jump to content

20250181650. Structured Output Duplic (SAS Institute .)

From WikiPatents

STRUCTURED OUTPUT OF DUPLICATE OR NEAR-DUPLICATE TEXT DOCUMENTS IDENTIFIED USING AUTOMATED NEAR-DUPLICATE DETECTION FOR TEXT DOCUMENTS

Abstract: techniques described herein provide for generation of structured output for documents identified using automated near-duplicate detection. in one example, a system can receive a set of documents including at least one pair of similar documents determined to be similar to one another based on similarity scores generated using a predefined similarity scoring technique. the system can generate document groups by merging together pairs of documents that share at least one document. the system can, for each of the document groups, identify a representative document for the document group. the system can generate an output for display including a section for each document group, in which each section includes the representative document for the document group and, for each document in the document group, the similarity score relative to the representative document for the document group.

Inventor(s): Fan WANG, Teresa S. JADE, Xu YANG

CPC Classification: G06F16/906 (Details of database functions independent of the retrieved data types)

Search for rejections for patent application number 20250181650


Cookies help us deliver our services. By using our services, you agree to our use of cookies.