US Patent Application 17722003. Latency-Aware Neural Network Pruning and Applications Thereof simplified abstract

From WikiPatents
Jump to navigation Jump to search

Latency-Aware Neural Network Pruning and Applications Thereof

Organization Name

Microsoft Technology Licensing, LLC


Inventor(s)

Li Zhang of Beijing (CN)


Youkow Homma of Bellevue WA (US)


Yujing Wang of Beijing (CN)


Min Wu of Bothell WA (US)


Mao Yang of Beijing (CN)


Ruofei Zhang of Mountain View CA (US)


Ting Cao of Beijing (CN)


Wei Shen of Bellevue WA (US)


Latency-Aware Neural Network Pruning and Applications Thereof - A simplified explanation of the abstract

  • This abstract for appeared for US patent application number 17722003 Titled 'Latency-Aware Neural Network Pruning and Applications Thereof'

Simplified Explanation

The abstract describes a system called Neural Architecture Search (NAS) that can generate a machine-trained model. This model is designed to meet specific real-time speed requirements by selecting from a group of sparse candidate models. The NAS system selects a parent model and then modifies a specific layer to create a child model. The system calculates a reward score for the child model based on its speed and accuracy. Using reinforcement learning, the system updates the logic used to make these modifications based on the reward score. This process is repeated multiple times. The resulting machine-trained model can be used in an online application to provide real-time responses to user queries.


Original Abstract Submitted

A neural architecture search (NAS) system generates a machine-trained model that satisfies specified real-time latency objectives by selecting among a collection of layer-wise sparse candidate models. In operation, the NAS system selects a parent model from among the candidate models. The NAS system then identifies a particular layer of the parent model, and then determines how the layer is to be mutated, to yield a child model. The NAS system calculates a reward score for the child model based on its latency and accuracy. The NAS system then uses reinforcement learning to update the trainable logic used to perform the mutating based on the reward score. The NAS system repeats the above process a plurality of times. An online application system can use the machine-trained model eventually produced by the NAS system to deliver real-time responses to user queries.