18210498. LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING (Oracle International Corporation)
Contents
LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING
Organization Name
Oracle International Corporation
Inventor(s)
Zheng Wang of Sammamish WA (US)
Mengqing Guo of Redmond WA (US)
Liyu Gong of Lexington KY (US)
Jun Qian of Redwood Shores CA (US)
Katharine D'orazio of Brooklyn NY (US)
LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING
This abstract first appeared for US patent application 18210498 titled 'LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING
Original Abstract Submitted
Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.
- Oracle International Corporation
- Zheng Wang of Sammamish WA (US)
- Tao Sheng of Bellevue WA (US)
- Yazhe Hu of Bellevue WA (US)
- Mengqing Guo of Redmond WA (US)
- Liyu Gong of Lexington KY (US)
- Jun Qian of Redwood Shores CA (US)
- Katharine D'orazio of Brooklyn NY (US)
- G06V30/413
- G06V30/19
- G06V30/412
- G06V30/416
- CPC G06V30/413