20230168899. GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS simplified abstract (d-MATRIX CORPORATION)

From WikiPatents
Jump to navigation Jump to search

GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS

Organization Name

d-MATRIX CORPORATION

Inventor(s)

Sudeep Bhoja of Cupertino CA (US)

Siddharth Sheth of Cupertion CA (US)

GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20230168899 titled 'GENERATIVE AI ACCELERATOR APPARATUS USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS

Simplified Explanation

The patent application describes an AI accelerator apparatus that uses in-memory compute chiplet devices. The apparatus consists of chiplets, each containing multiple tiles, which in turn contain slices, a CPU, and a hardware dispatch device. The slices include a digital in-memory compute (DIMC) device for high throughput computations, specifically for accelerating attention functions in transformer-based models used in machine learning applications.

  • The chiplet devices have multiple tiles, each containing slices, a CPU, and a hardware dispatch device.
  • The slices include a digital in-memory compute (DIMC) device for high throughput computations.
  • The DIMC device is specifically designed to accelerate attention functions in transformer-based models used in machine learning applications.
  • The chiplet devices also have a single input multiple data (SIMD) device for further processing DIMC output and computing softmax functions for attention functions.
  • The chiplet devices include die-to-die (D2D) interconnects, a PCIe bus, a DRAM interface, and a global CPU interface for communication between chiplets, memory, and a server or host system.

Potential applications of this technology:

  • Accelerating attention functions in transformer-based models used in machine learning applications.
  • Enhancing the performance of generative AI models.
  • Improving the efficiency of AI accelerators in server or host systems.

Problems solved by this technology:

  • Addressing the computational demands of attention functions in transformer-based models.
  • Increasing the throughput and efficiency of AI accelerators.
  • Facilitating communication between chiplets, memory, and server or host systems.

Benefits of this technology:

  • Faster and more efficient computation of attention functions.
  • Improved performance and throughput of AI accelerators.
  • Enhanced capabilities for generative AI models.
  • Better communication and integration between chiplets, memory, and server or host systems.


Original Abstract Submitted

an ai accelerator apparatus using in-memory compute chiplet devices. the apparatus includes one or more chiplets, each of which includes a plurality of tiles. each tile includes a plurality of slices, a central processing unit (cpu), and a hardware dispatch device. each slice can include a digital in-memory compute (dimc) device configured to perform high throughput computations. in particular, the dimc device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications, including generative ai. a single input multiple data (simd) device configured to further process the dimc output and compute softmax functions for the attention functions. the chiplet can also include die-to-die (d2d) interconnects, a peripheral component interconnect express (pcie) bus, a dynamic random access memory (dram) interface, and a global cpu interface to facilitate communication between the chiplets, memory and a server or host system.