INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Organization Name

Inventor(s)

INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18532795 titled 'INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Simplified Explanation

The abstract describes a patent application for a technology that determines scale factors for neural network layers and converts tensor data to a different datatype for processing.

The technology involves determining per-layer scale factors for tensor data in a neural network model.
It converts the tensor data from floating point to an 8-bit datatype.
The converted tensor data is used to generate an output tensor based on the scale factors.

Potential Applications

This technology could be applied in various fields such as image recognition, natural language processing, and autonomous driving systems.

Problems Solved

This technology addresses the challenge of optimizing neural network processing by efficiently converting tensor data to a more compact datatype.

Benefits

The benefits of this technology include improved computational efficiency, reduced memory usage, and potentially faster processing speeds for neural networks.

Potential Commercial Applications

A potential commercial application of this technology could be in the development of edge computing devices for real-time AI applications.

Possible Prior Art

One possible prior art for this technology could be research on optimizing neural network processing through datatype conversion and scaling techniques.

What are the specific operations involved in determining the per-layer scale factor for tensor data?

The specific operations involved in determining the per-layer scale factor for tensor data include analyzing the data distribution within each layer, calculating the appropriate scaling factor based on the distribution, and applying the scale factor to the tensor data.

How does converting tensor data to an 8-bit datatype impact the overall performance of the neural network model?

Converting tensor data to an 8-bit datatype can reduce memory usage and improve computational efficiency by allowing for faster data processing and storage within the neural network model. This optimization can lead to overall performance improvements in terms of speed and resource utilization.

Original Abstract Submitted

One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

18532795. INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION simplified abstract (Intel Corporation)

Contents

INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Organization Name

Inventor(s)