20250231983. System Met (L&T TECHNOLOGY SERVICES LIMITED)
SYSTEM AND METHOD FOR META-DATA EXTRACTION FROM DOCUMENTS
Abstract: a method of extracting meta-data from a document includes capturing style attributes from the document, identifying cell-wise location coordinates for text characters using page segmentation and border table extraction, and finding relationship between nearby cells using surrounding embedding by determining shortest distant text cell in top, left, right, and bottom direction. the method further includes applying graph convolution network with informative attention (gcn-ia) for providing more attention to informative nodes for generating better representation of surrounding embedding and capturing a deep contextual meaning from text cells. a domain specific language model is utilized and improved by a domain aware tokenizer. the method includes capturing a complex visual layout of the document using the domain specific visual model, determining meta-data information, representing linguistic and visual contexts of the document, and correcting the extracted output by applying advanced-post processing on the extracted output from advanced language-visual model.
Inventor(s): ANKIT MALVIYA, MRIDUL BALARAMAN, MADHUSUDAN SINGH
CPC Classification: G06F16/38 (Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually)
Search for rejections for patent application number 20250231983