COMPUTATIONALLY EFFICIENT DISTILLATION USING GENERATIVE NEURAL NETWORKS

Abstract: methods, systems, and apparatus for training a student neural network having multiple student parameters to perform a machine learning task. in one aspect, a system comprises one or more computers configured to obtain a batch comprising one or more training inputs to generate multiple modified training inputs. the one or more computers process each of the multiple modified training inputs using the student neural network and a teacher neural network to generate a respective student output and a respective teacher output for the machine learning task for each of the modified training inputs. the one or more computers update the student parameters by computing a gradient with respect to the student parameters of a loss function that includes a first term that measures, for each of the modified training inputs, a loss between the student output for the modified training input and the teacher output for the modified training input.

Inventor(s): Ankit Singh Rawat, Manzil Zaheer, Chong You, Seungyeon Kim, Andreas Veit, Himanshu Jain

CPC Classification: G06N3/096 (Transfer learning)

Search for rejections for patent application number 20250173578