Google llc (20250148025). WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION
WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION
Organization Name
Inventor(s)
Dongfang Liu of Rochester NY US
WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION
This abstract first appeared for US patent application 20250148025 titled 'WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION
Original Abstract Submitted
the technology provides a rich attention mechanism for structured information extraction of web pages and other electronic documents. an input layer of a model obtains system, information associated with the document, including field tokens representing respective fields to be extracted from the document, structured document type tokens associated, and text tokens from a text sequence in the document. an encoder connects the field tokens, the s type tokens and the text tokens according to a set of different attention patterns. the encoder generates an overall token representation based on the set of different attention patterns. an output layer of the model extracts a final text span for the each of the respective fields from the set of text tokens. the extracted final text span for each of the respective fields is stored in memory, and can be produced in response to a search query, analytics evaluation or other request.