Jump to content

International business machines corporation (20250131759). DYNAMIC DOCUMENT CLASSIFICATION

From WikiPatents


DYNAMIC DOCUMENT CLASSIFICATION

Organization Name

international business machines corporation

Inventor(s)

Jun Hong Zhao of ShangDi CN

Dong Rui Li of Beijing CN

Ang Yi of Beijing CN

Jing Zhang of Beijing CN

Hai Cheng Wang of Beijing CN

Yang Zhong Li of Beijing CN

DYNAMIC DOCUMENT CLASSIFICATION

This abstract first appeared for US patent application 20250131759 titled 'DYNAMIC DOCUMENT CLASSIFICATION

Original Abstract Submitted

in an approach, a processor performs document layout analysis on a document generating a plurality of textual regions; extracts characteristics from each of the plurality of textual regions and associates the respective characteristics to the respective textual region as metadata; classifies each of the plurality of textual regions as an optical character recognition (ocr) region, non-ocr valuable region, or non-ocr non-valuable region using a classifier; performs ocr on each ocr region generating an ocr output; identifies associated constant ocr data from a constant ocr data repository for each non-ocr valuable region; merges the associated constant ocr data with the ocr output generating a complete ocr data for the received document; performs data extraction on the complete ocr data to identify data fields and key-value pairs generating extracted data; and determines whether the extracted data is valid based on a set of rules.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.