17808293. Machine Learning Based Document Visual Element Extraction simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Machine Learning Based Document Visual Element Extraction

Organization Name

GOOGLE LLC

Inventor(s)

Nikolay Glushnev of Woodinville WA (US)

Qingze Wang of San Jose CA (US)

Emmanouil Koukoumidis of Kirkland WA (US)

Henry Wahyudi Setiawan of Bellevue WA (US)

Lauro Ivo Beltrao Colaco Costa of Kirkland WA (US)

Vincent Perot of Brooklyn NY (US)

Machine Learning Based Document Visual Element Extraction - A simplified explanation of the abstract

This abstract first appeared for US patent application 17808293 titled 'Machine Learning Based Document Visual Element Extraction

Simplified Explanation

Abstract Explanation

The patent application describes a method that involves analyzing a document containing both text and visual elements. The method uses machine learning models to determine the location of each textual field and the visual element within the document. It then assigns a visual element anchor token to the visual element and inserts it into the textual fields based on its location and the location of the textual fields. After inserting the visual element anchor token, the method extracts structured entities representing the textual fields and the visual element using a text-based extraction model.

  • The method analyzes a document with both text and visual elements.
  • It determines the location of each textual field and the visual element within the document.
  • It assigns a visual element anchor token to the visual element.
  • The visual element anchor token is inserted into the textual fields based on their respective locations.
  • The method extracts structured entities representing the textual fields and the visual element using a text-based extraction model.

Potential Applications

  • Document analysis and organization
  • Data extraction from documents with mixed text and visual elements
  • Content management systems
  • Information retrieval and indexing

Problems Solved

  • Efficiently analyzing documents with both text and visual elements
  • Accurately determining the location of textual fields and visual elements within a document
  • Extracting structured entities from documents with mixed content

Benefits

  • Improved document analysis and organization
  • Enhanced data extraction capabilities
  • Streamlined content management processes
  • More efficient information retrieval and indexing


Original Abstract Submitted

A method includes obtaining a document with textual fields and a visual element. For each textual field, the method includes determining a textual offset for the textual field that indicates a location of the textual field relative to each other textual field in the document. The method includes detecting, using a machine learning vision model, the visual element and determining a visual element offset indicating a location of the visual element relative to each textual field in the document. The method includes assigning the visual element a visual element anchor token and inserting the visual element anchor token into the textual fields in an order based on the visual element offset and the respective textual offsets. The method also includes, after inserting the visual element anchor token, extracting, using a text-based extraction model, from the textual fields, structured entities representing the series of textual fields and the visual element.