18399578. INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS simplified abstract (Intel Corporation)
Contents
- 1 INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 Original Abstract Submitted
INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS
Organization Name
Inventor(s)
Naveen K. Mellempudi of Bangalore (IN)
Mrinmay Dutta of Bangalore (IN)
Dheevatsa Mudigere of Bangalore (IN)
Abhisek Kundu of Bangalore (IN)
INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS - A simplified explanation of the abstract
This abstract first appeared for US patent application 18399578 titled 'INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS
Simplified Explanation
The disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor executes an asymmetric FMA instruction by processing elements of the second source vector in SIMD lanes, multiplying each element by a corresponding element of the first source vector, and accumulating the results with previous contents of the destination.
- Processor executes asymmetric FMA instruction
- Fetch circuitry fetches FMA instruction with opcode, destination, and source vectors
- Decode circuitry decodes fetched FMA instruction
- SIMD execution circuit processes elements of second source vector in SIMD lanes
- Multiply each element by corresponding element of first source vector
- Accumulate results with previous contents of destination
- Supports different SIMD lane widths, source vector widths, and destination widths
Potential Applications
This technology can be applied in:
- High-performance computing
- Scientific simulations
- Machine learning algorithms
- Signal processing applications
Problems Solved
- Efficient execution of FMA operations with variable-precision inputs
- Optimized processing of elements in SIMD lanes
- Improved performance in vectorized operations
Benefits
- Increased computational efficiency
- Enhanced performance in parallel processing tasks
- Flexibility in handling different precision inputs
Potential Commercial Applications
- Data centers
- Supercomputing facilities
- AI hardware accelerators
- Embedded systems for signal processing
Possible Prior Art
Prior art may include:
- SIMD execution circuits in processors
- FMA instructions in computing architectures
- Variable-precision arithmetic operations in hardware designs
What are the specific SIMD lane widths supported by this technology?
The specific SIMD lane widths supported by this technology are 16 bits, 32 bits, and 64 bits.
How does this technology handle variable-precision inputs in FMA operations?
This technology handles variable-precision inputs in FMA operations by processing elements of the second source vector in SIMD lanes, multiplying each element by a corresponding element of the first source vector, and accumulating the results with previous contents of the destination.
Original Abstract Submitted
Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.