18209024. EFFICIENT DATA DISTRIBUTION PRESERVING TRAINING PARADIGM (Oracle International Corporation)
Contents
EFFICIENT DATA DISTRIBUTION PRESERVING TRAINING PARADIGM
Organization Name
Oracle International Corporation
Inventor(s)
Renata Khasanova of Zurich (CH)
Felix Schmidt of Baden-Dattwil (CH)
EFFICIENT DATA DISTRIBUTION PRESERVING TRAINING PARADIGM
This abstract first appeared for US patent application 18209024 titled 'EFFICIENT DATA DISTRIBUTION PRESERVING TRAINING PARADIGM
Original Abstract Submitted
A computer performs deduplication of an original training corpus for maintaining accuracy of accelerated training of a reconstructive or other machine learning (ML) model. Distinct multidimensional points are detected in the original training corpus that contains duplicates. Based on duplicates in the original training corpus, a respective observed frequency of each distinct multidimensional point is increased. In a reconstructive embodiment and based on a particular distinct multidimensional point as input, a reconstruction of the particular distinct multidimensional point is generated by a reconstructive ML model. Based on increasing the observed frequency of the particular distinct multidimensional point, a scaled error of the reconstruction of the particular distinct multidimensional point is increased. Based on the scaled error of the reconstruction of the particular distinct multidimensional point, accuracy of the reconstructive model is increased. In an embodiment, the reconstructive ML model is an artificial neural network that is a denoising autoencoder that detects anomalous database statements.