17985061. NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP

Organization Name

GOOGLE LLC

Inventor(s)

Olivier Temam of Antony (FR)

Harshit Khaitan of San Jose CA (US)

Ravi Narayanaswami of San Jose CA (US)

Dong Hyuk Woo of San Jose CA (US)

NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP - A simplified explanation of the abstract

This abstract first appeared for US patent application 17985061 titled 'NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP

Simplified Explanation

The abstract describes an embodiment of an accelerator that includes a computing unit, memory banks, and a traversal unit. The computing unit performs computations using multiply accumulate (MAC) operators and receives parameters from the memory banks. The traversal unit controls the flow of input activations from the first memory bank to the MAC operator.

  • The accelerator includes a computing unit with MAC operators.
  • It has a first memory bank for storing input activations.
  • It has a second memory bank for storing parameters used in computations.
  • The second memory bank can store enough neural network parameters for low latency and high throughput.
  • The computing unit performs computations on data arrays using the MAC operator.
  • The first traversal unit controls the flow of input activations to the MAC operator.

Potential Applications

  • Artificial intelligence and machine learning applications.
  • Neural network training and inference.
  • High-performance computing tasks.

Problems Solved

  • Reduces latency in performing computations.
  • Improves throughput for efficient processing.
  • Enables storage of a sufficient amount of neural network parameters.

Benefits

  • Faster and more efficient computations.
  • Improved performance in AI and ML tasks.
  • Enables real-time processing of large data sets.


Original Abstract Submitted

One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.