18210498. LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING (Oracle International Corporation)

From WikiPatents
Revision as of 07:31, 19 December 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING

Organization Name

Oracle International Corporation

Inventor(s)

Zheng Wang of Sammamish WA (US)

Tao Sheng of Bellevue WA (US)

Yazhe Hu of Bellevue WA (US)

Mengqing Guo of Redmond WA (US)

Liyu Gong of Lexington KY (US)

Jun Qian of Redwood Shores CA (US)

Katharine D'orazio of Brooklyn NY (US)

LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING

This abstract first appeared for US patent application 18210498 titled 'LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING



Original Abstract Submitted

Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.