18184014. DOCUMENT INFORMATION EXTRACTION simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

DOCUMENT INFORMATION EXTRACTION

Organization Name

International Business Machines Corporation

Inventor(s)

Fei Wang of Dalian (CN)

Zhong Fang Yuan of Xi'an (CN)

Tong Liu of Xi'an (CN)

Han Qiao Yu of Shaanxi Province (CN)

Xiang Yu Yang of Xi'an (CN)

DOCUMENT INFORMATION EXTRACTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18184014 titled 'DOCUMENT INFORMATION EXTRACTION

Simplified Explanation

This patent application describes a method for extracting information from documents using knowledge graphs and prompt-based learning.

  • Optical character recognition (OCR) is used to extract text lines and bounding boxes from a document.
  • The text lines are encoded into semantic vectors and the bounding boxes into position vectors to create a knowledge graph.
  • A query with a key value is used to identify candidate nodes in the knowledge graph.
  • Prompt templates are generated to assess the closeness of candidate nodes to the key value.
  • Extraction information is output based on the candidate node with the highest confidence level.

Key Features and Innovation

  • Utilizes OCR for text extraction and knowledge graph creation.
  • Semantic and position vectors are used for encoding text lines and bounding boxes.
  • Prompt-based learning is employed to determine the relevance of candidate nodes to a query.
  • Outputs extraction information based on confidence levels of candidate nodes.

Potential Applications

This technology can be applied in various fields such as information retrieval, data analysis, and document processing.

Problems Solved

  • Efficient extraction of information from documents.
  • Improved accuracy in identifying relevant information based on queries.
  • Streamlined document processing and analysis.

Benefits

  • Enhanced information retrieval capabilities.
  • Increased efficiency in data analysis tasks.
  • Improved accuracy in extracting relevant information from documents.

Commercial Applications

This technology can be utilized in industries such as legal, healthcare, and finance for document analysis, data extraction, and information retrieval tasks.

Prior Art

Researchers can explore prior art related to knowledge graphs, OCR technology, and prompt-based learning methods in document analysis and information retrieval.

Frequently Updated Research

Stay updated on advancements in OCR technology, knowledge graph applications, and prompt-based learning methods for document analysis and information extraction.

Questions about Document Extraction using Knowledge Graphs and Prompt-Based Learning

1. How does this technology improve the efficiency of information extraction from documents? 2. What are the potential challenges in implementing prompt-based learning for document analysis and information retrieval?


Original Abstract Submitted

An embodiment for a method of extracting information from documents using knowledge graphs and prompt-based learning. The embodiment may receive a document and perform optical character recognition (OCR) to obtain OCR text lines and associated bounding boxes. The embodiment may encode each of the obtained OCR text lines into semantic vectors and each of the associated bounding boxes into position vectors to generate a knowledge graph using fusion vectors derived therefrom. The embodiment may receive a query including a key value. The embodiment may identify a series of candidate nodes including a series of most similar nearby nodes positioned near a first node associated with the key value. The embodiment may generate prompt template to determine closeness of the candidate nodes to the key value and calculate associated confidence levels. The embodiment may output extraction information associated with the candidate node having a highest calculated confidence level.