17960052. MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS simplified abstract (QUALCOMM Incorporated)
MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS
Organization Name
Inventor(s)
Andrey Kuzmin of Hilversum (NL)
Marinus Willem Van Baalen of Amsterdam (NL)
Markus Nagel of Amsterdam (NL)
Arash Behboodi of Amsterdam (NL)
MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS - A simplified explanation of the abstract
This abstract first appeared for US patent application 17960052 titled 'MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS
Simplified Explanation
The abstract describes a method implemented by a processor for an artificial neural network (ANN). The method involves retrieving a dense quantized matrix and a sparse quantized matrix, which represent a codebook and linear coefficients respectively, for a layer of the ANN. These matrices are associated with a weight tensor of the layer. The method then determines the weight tensor by multiplying the dense and sparse quantized matrices. Finally, the method processes an input at the layer using the weight tensor.
- The method involves retrieving and manipulating quantized matrices for a layer of an artificial neural network.
- The matrices represent a codebook and linear coefficients associated with a weight tensor.
- The weight tensor is determined by multiplying the dense and sparse quantized matrices.
- The method allows for processing inputs at the layer using the weight tensor.
Potential Applications
- Artificial intelligence and machine learning systems
- Deep learning algorithms
- Image and speech recognition systems
- Natural language processing
Problems Solved
- Efficient representation and manipulation of weight tensors in artificial neural networks
- Reducing memory and computational requirements for neural network operations
- Improving the performance and speed of deep learning algorithms
Benefits
- Improved efficiency in artificial neural network operations
- Reduced memory and computational requirements
- Faster processing and improved performance in deep learning tasks
Original Abstract Submitted
A processor-implemented method includes retrieving, for a layer of a set of layers of an artificial neural network (ANN), a dense quantized matrix representing a codebook and a sparse quantized matrix representing linear coefficients. The dense quantized matrix and the sparse quantized matrix may be associated with a weight tensor of the layer. The processor-implemented method also includes determining, for the layer of the set of layers, the weight tensor based on a product of the dense quantized matrix and the sparse quantized matrix. The processor-implemented method further includes processing, at the layer, an input based on the weight tensor.