18399578. INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS simplified abstract (Intel Corporation)

From WikiPatents
Jump to navigation Jump to search

INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS

Organization Name

Intel Corporation

Inventor(s)

Dipankar Das of Pune (IN)

Naveen K. Mellempudi of Bangalore (IN)

Mrinmay Dutta of Bangalore (IN)

Arun Kumar of Bangalore (IN)

Dheevatsa Mudigere of Bangalore (IN)

Abhisek Kundu of Bangalore (IN)

INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18399578 titled 'INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS

Simplified Explanation

The disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor executes an asymmetric FMA instruction by processing elements of the second source vector in SIMD lanes, multiplying each element by a corresponding element of the first source vector, and accumulating the results with previous contents of the destination.

  • Processor executes asymmetric FMA instruction
  • Fetch circuitry fetches FMA instruction with opcode, destination, and source vectors
  • Decode circuitry decodes fetched FMA instruction
  • SIMD execution circuit processes elements of second source vector in SIMD lanes
  • Multiply each element by corresponding element of first source vector
  • Accumulate results with previous contents of destination
  • Supports different SIMD lane widths, source vector widths, and destination widths

Potential Applications

This technology can be applied in:

  • High-performance computing
  • Scientific simulations
  • Machine learning algorithms
  • Signal processing applications

Problems Solved

  • Efficient execution of FMA operations with variable-precision inputs
  • Optimized processing of elements in SIMD lanes
  • Improved performance in vectorized operations

Benefits

  • Increased computational efficiency
  • Enhanced performance in parallel processing tasks
  • Flexibility in handling different precision inputs

Potential Commercial Applications

  • Data centers
  • Supercomputing facilities
  • AI hardware accelerators
  • Embedded systems for signal processing

Possible Prior Art

Prior art may include:

  • SIMD execution circuits in processors
  • FMA instructions in computing architectures
  • Variable-precision arithmetic operations in hardware designs

What are the specific SIMD lane widths supported by this technology?

The specific SIMD lane widths supported by this technology are 16 bits, 32 bits, and 64 bits.

How does this technology handle variable-precision inputs in FMA operations?

This technology handles variable-precision inputs in FMA operations by processing elements of the second source vector in SIMD lanes, multiplying each element by a corresponding element of the first source vector, and accumulating the results with previous contents of the destination.


Original Abstract Submitted

Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.