20230168899. GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS simplified abstract (d-MATRIX CORPORATION)
GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS
Organization Name
Inventor(s)
Sudeep Bhoja of Cupertino CA (US)
Siddharth Sheth of Cupertion CA (US)
GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS - A simplified explanation of the abstract
This abstract first appeared for US patent application 20230168899 titled 'GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS
Simplified Explanation
The patent application describes an AI accelerator apparatus that uses in-memory compute chiplet devices. The apparatus consists of chiplets, each containing multiple tiles, which in turn contain slices, a CPU, and a hardware dispatch device. The slices include a digital in-memory compute (DIMC) device for high throughput computations, specifically for accelerating attention functions in transformer-based models used in machine learning applications.
- The chiplet devices have multiple tiles, each containing slices, a CPU, and a hardware dispatch device.
- The slices include a digital in-memory compute (DIMC) device for high throughput computations.
- The DIMC device is specifically designed to accelerate attention functions in transformer-based models used in machine learning applications.
- The chiplet devices also have a single input multiple data (SIMD) device for further processing DIMC output and computing softmax functions for attention functions.
- The chiplet devices include die-to-die (D2D) interconnects, a PCIe bus, a DRAM interface, and a global CPU interface for communication between chiplets, memory, and a server or host system.
Potential applications of this technology:
- Accelerating attention functions in transformer-based models used in machine learning applications.
- Enhancing the performance of generative AI models.
- Improving the efficiency of AI accelerators in server or host systems.
Problems solved by this technology:
- Addressing the computational demands of attention functions in transformer-based models.
- Increasing the throughput and efficiency of AI accelerators.
- Facilitating communication between chiplets, memory, and server or host systems.
Benefits of this technology:
- Faster and more efficient computation of attention functions.
- Improved performance and throughput of AI accelerators.
- Enhanced capabilities for generative AI models.
- Better communication and integration between chiplets, memory, and server or host systems.
Original Abstract Submitted
an ai accelerator apparatus using in-memory compute chiplet devices. the apparatus includes one or more chiplets, each of which includes a plurality of tiles. each tile includes a plurality of slices, a central processing unit (cpu), and a hardware dispatch device. each slice can include a digital in-memory compute (dimc) device configured to perform high throughput computations. in particular, the dimc device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications, including generative ai. a single input multiple data (simd) device configured to further process the dimc output and compute softmax functions for the attention functions. the chiplet can also include die-to-die (d2d) interconnects, a peripheral component interconnect express (pcie) bus, a dynamic random access memory (dram) interface, and a global cpu interface to facilitate communication between the chiplets, memory and a server or host system.