International business machines corporation (20250131759). DYNAMIC DOCUMENT CLASSIFICATION
DYNAMIC DOCUMENT CLASSIFICATION
Organization Name
international business machines corporation
Inventor(s)
DYNAMIC DOCUMENT CLASSIFICATION
This abstract first appeared for US patent application 20250131759 titled 'DYNAMIC DOCUMENT CLASSIFICATION
Original Abstract Submitted
in an approach, a processor performs document layout analysis on a document generating a plurality of textual regions; extracts characteristics from each of the plurality of textual regions and associates the respective characteristics to the respective textual region as metadata; classifies each of the plurality of textual regions as an optical character recognition (ocr) region, non-ocr valuable region, or non-ocr non-valuable region using a classifier; performs ocr on each ocr region generating an ocr output; identifies associated constant ocr data from a constant ocr data repository for each non-ocr valuable region; merges the associated constant ocr data with the ocr output generating a complete ocr data for the received document; performs data extraction on the complete ocr data to identify data fields and key-value pairs generating extracted data; and determines whether the extracted data is valid based on a set of rules.