17848679. SYSTEMS AND METHODS FOR DISTRIBUTING LAYERS OF SPECIAL MIXTURE-OF-EXPERTS MACHINE LEARNING MODELS simplified abstract (Microsoft Technology Licensing, LLC)

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR DISTRIBUTING LAYERS OF SPECIAL MIXTURE-OF-EXPERTS MACHINE LEARNING MODELS

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Devangkumar Rameshbhai Patel of Fremont CA (US)

Wei Zuo of Campbell CA (US)

Yuan Yu of Cupertino CA (US)

SYSTEMS AND METHODS FOR DISTRIBUTING LAYERS OF SPECIAL MIXTURE-OF-EXPERTS MACHINE LEARNING MODELS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17848679 titled 'SYSTEMS AND METHODS FOR DISTRIBUTING LAYERS OF SPECIAL MIXTURE-OF-EXPERTS MACHINE LEARNING MODELS

Simplified Explanation

The patent application describes a computing system with different types of accelerators, each with different memory and processing capabilities. The system uses these accelerators to distribute a machine learning model that consists of dense and sparse layers. The dense layers are distributed on accelerators with greater memory capability, while the sparse layers are distributed on accelerators with greater processing capability.

  • The computing system has different accelerators with varying memory and processing capabilities.
  • The machine learning model is divided into dense and sparse layers.
  • Dense layers are distributed on accelerators with greater memory capability.
  • Sparse layers are distributed on accelerators with greater processing capability.

Potential Applications

This technology can be applied in various fields where machine learning models are used, such as:

  • Natural language processing
  • Computer vision
  • Speech recognition
  • Recommendation systems

Problems Solved

This technology addresses the following problems:

  • Imbalance between memory and processing capabilities in accelerators.
  • Efficient distribution of machine learning models across different accelerators.
  • Optimizing performance and resource utilization in computing systems.

Benefits

The benefits of this technology include:

  • Improved performance by utilizing accelerators with different capabilities effectively.
  • Enhanced memory capacity for processing dense layers.
  • Increased processing power for handling sparse layers.
  • Efficient utilization of computing resources.


Original Abstract Submitted

Some disclosed embodiments are directed to computing systems having different accelerators such that a first set of accelerators has a greater memory capability than a second set accelerators, while the second set of accelerators has a greater processing capability than the first set of accelerators. A machine learning model having different dense layers and sparse layers is distributed on the different accelerators such that the dense layers are distributed on one or more accelerators selected from the first set of accelerators and the sparse layers are distributed on one or more accelerators in the second set of accelerators.