MODEL COMPRESSION SERVICE BASED ON RUNTIME PERFORMANCE

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

MODEL COMPRESSION SERVICE BASED ON RUNTIME PERFORMANCE - A simplified explanation of the abstract

This abstract first appeared for US patent application 17643243 titled 'MODEL COMPRESSION SERVICE BASED ON RUNTIME PERFORMANCE

Simplified Explanation

The patent application describes a method, computer system, and computer program product for a model compression service. The method involves compressing a deep neural network (DNN) to meet the performance requirements of a specific type of hardware.

The method starts by determining the initial DNN and compression algorithm available in a compression engine, as well as the type of target hardware and its performance requirement.
Multiple compressed models of the initial DNN are then emulated on the target hardware, using different configuration data.
Runtime performance data is collected for each compressed model, which is used to obtain a runtime performance estimator of the target hardware through regression analysis.
The runtime performance estimator is then applied to the compression algorithm, generating a compressed DNN that meets the performance requirement of the target hardware.

Potential Applications

This technology can be applied in various fields that utilize deep neural networks, such as computer vision, natural language processing, and speech recognition.
It can be used in edge computing devices, IoT devices, and other resource-constrained systems where efficient model compression is crucial.

Problems Solved

Model compression is essential for deploying deep neural networks on resource-limited devices, as it reduces memory and computational requirements.
The method solves the problem of finding the optimal compression configuration for a given target hardware, ensuring the compressed model meets the performance requirement.

Benefits

The method provides an automated and efficient way to compress deep neural networks, saving time and effort compared to manual optimization.
It allows for the deployment of complex models on resource-constrained devices, expanding the range of applications for deep learning technology.
The runtime performance estimator enables accurate prediction of the compressed model's performance on the target hardware, aiding in decision-making and optimization.

Original Abstract Submitted

A method, computer system and computer program product for model compression service. The method comprises determining an initial deep neural network (DNN) and an associated compression algorithm available in a compression engine, a type of target hardware and a performance requirement of target hardware. The method also comprises emulating a plurality of different compressed models of the initial DNN on target hardware of the type to obtain corresponding runtime performance data, wherein the different compressed models are defined with different configuration data. The method further comprises obtaining a runtime performance estimator of the target hardware by regression with the different configuration data and the corresponding runtime performance data. Lastly, the method comprises applying the runtime performance estimator to the compression algorithm by the compression engine to generate a compressed DNN of the initial DNN complying with the performance requirement of the type of target hardware.

17643243. MODEL COMPRESSION SERVICE BASED ON RUNTIME PERFORMANCE simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

Contents

MODEL COMPRESSION SERVICE BASED ON RUNTIME PERFORMANCE

Organization Name

Inventor(s)