Intel corporation (20240320047). COMPUTE-INTENSIVE KERNEL GENERATOR, MICRO-KERNEL CODE CACHE, FUSED KERNEL GENERATOR AND CYCLIC DEPENDENCE FREE GRAPH PARTITIONING FOR DEEP LEARNING WORKLOADS simplified abstract

From WikiPatents
Revision as of 08:18, 26 September 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

COMPUTE-INTENSIVE KERNEL GENERATOR, MICRO-KERNEL CODE CACHE, FUSED KERNEL GENERATOR AND CYCLIC DEPENDENCE FREE GRAPH PARTITIONING FOR DEEP LEARNING WORKLOADS

Organization Name

intel corporation

Inventor(s)

Jianhui Li of San Jose CA (US)

Zhennan Qin of Shanghai (CN)

Jiong Gong of Shanghai (CN)

Jingze Cui of Shanghai (CN)

Yijie Mei of Shanghai (CN)

Yunfei Song of Shanghai (CN)

COMPUTE-INTENSIVE KERNEL GENERATOR, MICRO-KERNEL CODE CACHE, FUSED KERNEL GENERATOR AND CYCLIC DEPENDENCE FREE GRAPH PARTITIONING FOR DEEP LEARNING WORKLOADS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240320047 titled 'COMPUTE-INTENSIVE KERNEL GENERATOR, MICRO-KERNEL CODE CACHE, FUSED KERNEL GENERATOR AND CYCLIC DEPENDENCE FREE GRAPH PARTITIONING FOR DEEP LEARNING WORKLOADS

The technology described in the patent application involves identifying data layouts associated with input and output tensors, generating micro-kernels based on these layouts, and creating nested outer loops for kernels to perform subtasks related to a given task.

  • The technology includes micro-kernel code caches, fused kernel generators, and cyclic dependence-free graph partitioning for deep learning workloads.
  • The innovation aims to optimize the performance of tasks represented by kernels by efficiently organizing data layouts and generating specialized micro-kernels.
  • By utilizing micro-kernel code caches and fused kernel generators, the technology enhances the execution speed and efficiency of deep learning workloads.
  • The cyclic dependence-free graph partitioning ensures that tasks can be parallelized effectively, leading to improved overall performance.
  • This technology is particularly beneficial for applications requiring complex data processing, such as deep learning algorithms and neural networks.

Potential Applications: - Deep learning algorithms - Neural networks - Image and speech recognition systems

Problems Solved: - Optimization of data layout for input and output tensors - Efficient generation of micro-kernels for specialized tasks - Enhanced performance and speed of deep learning workloads

Benefits: - Improved efficiency in processing complex data - Faster execution of deep learning tasks - Enhanced performance of neural networks and image recognition systems

Commercial Applications: Optimizing deep learning algorithms for faster and more efficient processing in various industries such as healthcare, finance, and technology.

Questions about the technology: 1. How does the technology improve the performance of deep learning workloads? 2. What are the key features of the micro-kernel code caches and fused kernel generators in this technology?


Original Abstract Submitted

systems, apparatuses and methods may provide for technology that identifies a data layout associated with input tensors and output tensors, generates a micro-kernel based at least in part on the data layout, and generates a nested outer loop for a kernel, wherein the micro-kernel performs one or more subtasks associated with a task represented by the kernel. the technology also includes micro-kernel code caches, fused kernel generators and cyclic dependence free graph partitioning for deep learning workloads.