Jump to content

Google llc (20250148025). WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION

From WikiPatents


WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION

Organization Name

google llc

Inventor(s)

Qifan Wang of Sunnyvale CA US

Dongfang Liu of Rochester NY US

WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION

This abstract first appeared for US patent application 20250148025 titled 'WEB PAGE TRANSFORMER FOR STRUCTURE INFORMATION EXTRACTION

Original Abstract Submitted

the technology provides a rich attention mechanism for structured information extraction of web pages and other electronic documents. an input layer of a model obtains system, information associated with the document, including field tokens representing respective fields to be extracted from the document, structured document type tokens associated, and text tokens from a text sequence in the document. an encoder connects the field tokens, the s type tokens and the text tokens according to a set of different attention patterns. the encoder generates an overall token representation based on the set of different attention patterns. an output layer of the model extracts a final text span for the each of the respective fields from the set of text tokens. the extracted final text span for each of the respective fields is stored in memory, and can be produced in response to a search query, analytics evaluation or other request.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.