17809321. INFORMATION EXTRACTION FROM DOCUMENTS CONTAINING HANDWRITTEN TEXT simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

INFORMATION EXTRACTION FROM DOCUMENTS CONTAINING HANDWRITTEN TEXT

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Saurabh Goyal of Jersey City NJ (US)

Catherine Finegan-dollak of Mohegan Lake NY (US)

ASHISH Verma of Nanuet NY (US)

INFORMATION EXTRACTION FROM DOCUMENTS CONTAINING HANDWRITTEN TEXT - A simplified explanation of the abstract

This abstract first appeared for US patent application 17809321 titled 'INFORMATION EXTRACTION FROM DOCUMENTS CONTAINING HANDWRITTEN TEXT

Simplified Explanation

The present invention is a method, computer system, and computer program product for information extraction. It involves receiving a mixed-text document that contains both typed and handwritten text, including at least one key-value pair. The invention also includes receiving the location information of at least one key from the mixed-text document and detecting the handwritten text based on this location information.

  • The invention is a system that can extract information from mixed-text documents containing both typed and handwritten text.
  • It includes a handwriting detection model that can identify handwritten text within the document.
  • The system can also identify the location of specific keys within the document.
  • By combining the location information and handwriting detection, the system can accurately extract the handwritten text associated with each key-value pair.

Potential Applications

  • This technology can be used in data entry tasks where information needs to be extracted from mixed-text documents.
  • It can be applied in various industries such as finance, healthcare, and legal, where handwritten forms or documents are still commonly used.
  • The system can automate the process of extracting information from mixed-text documents, saving time and reducing errors.

Problems Solved

  • Manual extraction of information from mixed-text documents can be time-consuming and error-prone.
  • Handwritten text is often difficult to interpret, leading to inaccuracies in data extraction.
  • This technology solves these problems by automating the extraction process and accurately identifying handwritten text.

Benefits

  • The system improves efficiency by automating the extraction of information from mixed-text documents.
  • It reduces errors by accurately detecting and extracting handwritten text.
  • The technology can be integrated into existing systems, enhancing their capabilities in handling mixed-text documents.


Original Abstract Submitted

A method, computer system, and a computer program product for information extraction is provided. The present invention may include receiving, by a handwriting detection model of an integrated system, a mixed-text document including a combination of typed text and handwritten text, where the received mixed-text document includes at least one key-value pair. The present invention may also include receiving, by the handwriting detection model of the integrated system, a first location information of at least one key from the at least one key-value pair in the received mixed-text document. The present invention may further include detecting, by the handwriting detection model of the integrated system, at least one handwritten text in the received mixed-text document based on the received first location information of the at least one key.