17548651. AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Yannick Saillet of Stuttgart (DE)

Alexander Lang of Stuttgart (DE)

Robert Kern of Karlsruhe (DE)

Gudrun Kaufmann of Falkensee (DE)

AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17548651 titled 'AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS

Simplified Explanation

The patent application describes a method for processing unstructured text documents and extracting unrecognized tokens from them. The extracted tokens are then matched with structured data elements from a predefined set of data sources. The identified structured data elements are labeled and associated with the unstructured text document.

  • The processor receives an unstructured text document.
  • Unrecognized tokens are extracted from the text document.
  • Structured data elements from predefined data sources are identified.
  • The structured data elements are related to the extracted unrecognized tokens.
  • Labels associated with the structured data elements are assigned to the text document.

Potential Applications

  • Information extraction from unstructured text documents.
  • Data integration and enrichment.
  • Automated document classification and labeling.

Problems Solved

  • Difficulty in extracting meaningful information from unstructured text.
  • Manual effort required for matching unstructured text with structured data.
  • Lack of efficient methods for labeling and associating structured data with unstructured text.

Benefits

  • Improved accuracy and efficiency in processing unstructured text.
  • Enhanced data integration and enrichment capabilities.
  • Automation of document classification and labeling tasks.


Original Abstract Submitted

In an approach, a processor receives an unstructured text document. A processor extracts at least one unrecognized token from the unstructured text document. A processor identifies at least one structured data element in a predefined set of data sources, where the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document. A processor relates a label associated with the identified at least one structured data element to the unstructured text document.