18347983. OUT OF DISTRIBUTION ELEMENT DETECTION FOR INFORMATION EXTRACTION (ORACLE INTERNATIONAL CORPORATION)
OUT OF DISTRIBUTION ELEMENT DETECTION FOR INFORMATION EXTRACTION
Organization Name
ORACLE INTERNATIONAL CORPORATION
Inventor(s)
Srikant Panda of Bangalore (IN)
Amit Agarwal of Bangalore (IN)
Gouttham Nambirajan of Bangalore (IN)
Kulbhushan Pachauri of Bangalore (IN)
OUT OF DISTRIBUTION ELEMENT DETECTION FOR INFORMATION EXTRACTION
This abstract first appeared for US patent application 18347983 titled 'OUT OF DISTRIBUTION ELEMENT DETECTION FOR INFORMATION EXTRACTION
Original Abstract Submitted
Techniques for extracting information from unstructured documents that enable an ML model to be trained such that the model can accurately distinguish in-distribution (âin-Dâ) elements and out-of-distribution (âOO-Dâ) elements within an unstructured document. Novel training techniques are used that train an ML model using a combination of a regular training dataset and an enhanced augmented training dataset. The regular training dataset is used to train an ML model to identify in-D elements, i.e., to classify an element extracted from a document as belonging to one of the in-D classes contained in the regular training dataset. The augmented training dataset, which is generated based upon the regular training dataset may contain one or more augmented elements which are used to train the model to identify OO-D elements, i.e., to classify an augmented element extracted from a document as belonging to an OO-D class instead of to an in-D class.