18734487. EFFICIENT SOFTMAX COMPUTATION WITH NO LOSS IN ACCURACY simplified abstract (Intel Corporation)

From WikiPatents
Jump to navigation Jump to search

EFFICIENT SOFTMAX COMPUTATION WITH NO LOSS IN ACCURACY

Organization Name

Intel Corporation

Inventor(s)

Jainaveen Sundaram Priya of Hillsboro OR (US)

Prerna Budhkar of Hillsboro OR (US)

Vui Seng Chua of Hillsboro OR (US)

Srivatsa Rangachar Srinivasa of Hillsboro OR (US)

Tanay Karnik of Portland OR (US)

EFFICIENT SOFTMAX COMPUTATION WITH NO LOSS IN ACCURACY - A simplified explanation of the abstract

This abstract first appeared for US patent application 18734487 titled 'EFFICIENT SOFTMAX COMPUTATION WITH NO LOSS IN ACCURACY

Simplified Explanation

A modified 2-pass version of the SoftMax operation is proposed to reduce computational cost in deep learning neural networks like transformer-based models and large language models. The first pass includes scalar operations to calculate a logarithm and an operand value, while the second pass involves addition and exponentiation to avoid divisions.

Key Features and Innovation

  • Modified 2-pass SoftMax operation
  • First pass includes scalar operations for efficiency
  • Second pass involves addition and exponentiation
  • Reduces computational cost without loss of accuracy

Potential Applications

The technology can be applied in various deep learning neural networks, especially transformer-based models and large language models, to improve efficiency and reduce computational overhead.

Problems Solved

  • High computational cost in deep learning neural networks
  • Efficiency and speed issues in transformer-based models and large language models
  • Division operations in SoftMax can be computationally expensive

Benefits

  • Improved efficiency in deep learning neural networks
  • Reduced computational cost without sacrificing accuracy
  • Faster processing in transformer-based models and large language models

Commercial Applications

Potential commercial applications include optimizing deep learning models for faster training and inference, improving the performance of large language models in natural language processing tasks, and enhancing the efficiency of transformer-based neural networks in various applications.

Prior Art

Readers can explore prior research on SoftMax optimization techniques, efficiency improvements in deep learning models, and advancements in transformer-based neural networks to understand the evolution of this technology.

Frequently Updated Research

Researchers are continually exploring ways to optimize SoftMax operations, improve efficiency in deep learning models, and enhance the performance of transformer-based neural networks through innovative techniques and algorithms.

Questions about SoftMax Optimization

How does the modified 2-pass SoftMax operation improve efficiency in deep learning neural networks?

The modified 2-pass SoftMax operation reduces computational cost by incorporating scalar operations in the first pass and avoiding divisions in the second pass, leading to faster processing without compromising accuracy.

What are the potential applications of the modified SoftMax operation in transformer-based models and large language models?

The technology can be applied to enhance the efficiency of transformer-based models and large language models, improving their performance in natural language processing tasks and other applications.


Original Abstract Submitted

A modified 2-pass version of the SoftMax operation can be implemented to address reduce computational cost without loss of accuracy, in particular for deep learning neural networks such as transformer-based neural networks and large language models (LLMs). The first pass is modified to include two scalar operations at the end. At the end of the first pass, a first scalar operation is performed to calculate a logarithm of the denominator, and a second scalar operation is performed to calculate an operand value based on a sum of the logarithm of the denominator and the maximum value. The second pass is modified to perform addition and exponentiation. In the second pass, an element of an input tensor is subtracted by the operand value to obtain an exponent, and a base is raised to the exponent. The second pass avoids divisions.