INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Organization Name

Inventor(s)

INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240160931 titled 'INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Simplified Explanation

The patent application describes a method for optimizing neural network models by converting tensor data to a more efficient datatype and applying per-layer scale factors.

The method involves determining a per-layer scale factor for tensor data associated with layers of a neural network model.
The tensor data is then converted from a floating point datatype to an 8-bit datatype.
The converted tensor data is used to generate an output tensor based on the per-layer scale factor.

Potential Applications

This technology could be applied in various fields such as image recognition, natural language processing, and autonomous vehicles.

Problems Solved

This technology addresses the problem of optimizing neural network models for efficient processing and memory usage.

Benefits

The benefits of this technology include faster inference times, reduced memory footprint, and improved energy efficiency in neural network applications.

Potential Commercial Applications

One potential commercial application of this technology is in edge computing devices where resources are limited but real-time processing is required.

Possible Prior Art

Prior art may include similar techniques for optimizing neural network models through quantization and scaling of tensor data.

Unanswered Questions

1. How does this method compare to other techniques for optimizing neural network models in terms of accuracy and efficiency? 2. Are there any limitations or drawbacks to converting tensor data to an 8-bit datatype that need to be considered in practical applications?

Original Abstract Submitted

one embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. the tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. the instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

Intel corporation (20240160931). INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION simplified abstract

Contents

INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Organization Name

Inventor(s)