US Patent Application 17722003. Latency-Aware Neural Network Pruning and Applications Thereof simplified abstract
Contents
Latency-Aware Neural Network Pruning and Applications Thereof
Organization Name
Microsoft Technology Licensing, LLC
Inventor(s)
Youkow Homma of Bellevue WA (US)
Ruofei Zhang of Mountain View CA (US)
Latency-Aware Neural Network Pruning and Applications Thereof - A simplified explanation of the abstract
- This abstract for appeared for US patent application number 17722003 Titled 'Latency-Aware Neural Network Pruning and Applications Thereof'
Simplified Explanation
The abstract describes a system called Neural Architecture Search (NAS) that can generate a machine-trained model. This model is designed to meet specific real-time speed requirements by selecting from a group of sparse candidate models. The NAS system selects a parent model and then modifies a specific layer to create a child model. The system calculates a reward score for the child model based on its speed and accuracy. Using reinforcement learning, the system updates the logic used to make these modifications based on the reward score. This process is repeated multiple times. The resulting machine-trained model can be used in an online application to provide real-time responses to user queries.
Original Abstract Submitted
A neural architecture search (NAS) system generates a machine-trained model that satisfies specified real-time latency objectives by selecting among a collection of layer-wise sparse candidate models. In operation, the NAS system selects a parent model from among the candidate models. The NAS system then identifies a particular layer of the parent model, and then determines how the layer is to be mutated, to yield a child model. The NAS system calculates a reward score for the child model based on its latency and accuracy. The NAS system then uses reinforcement learning to update the trainable logic used to perform the mutating based on the reward score. The NAS system repeats the above process a plurality of times. An online application system can use the machine-trained model eventually produced by the NAS system to deliver real-time responses to user queries.