MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES

The patent application describes a computing system with a Mixture-of-Experts (MoE) layer that can switch between different modes without sharing expert sub-model parameters among processing devices.

The computing system includes multiple processing devices.
The MoE layer in the system contains multiple expert sub-models with their own parameter values.
The MoE layer can switch between data parallel mode and expert-data-model parallel mode.
The switch between modes does not involve sharing parameter values among processing devices.

Potential Applications

The technology described in this patent application could be applied in various fields such as:

Machine learning
Artificial intelligence
Data analysis

Problems Solved

This technology addresses the following issues:

Efficient utilization of processing devices
Enhanced model performance
Improved scalability of MoE models

Benefits

The benefits of this technology include:

Increased flexibility in model training
Better resource management
Enhanced model accuracy

Potential Commercial Applications

A potential commercial application for this technology could be in:

Cloud computing services
Data centers
Autonomous systems

Possible Prior Art

One possible prior art for this technology could be the use of distributed computing systems in machine learning models to improve performance and scalability.

Unanswered Questions

How does the switching between modes impact the overall performance of the MoE model?

The article does not provide specific details on how the switching between modes affects the performance of the MoE model.

Are there any limitations to the scalability of the MoE layer in this computing system?

The article does not mention any potential limitations to the scalability of the MoE layer in the described computing system.

Original Abstract Submitted

A computing system is provided, including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The MoE layer includes a plurality of expert sub-models that each have a respective plurality of parameter values. The MoE layer is configured to be switchable between a data parallel mode and an expert-data-model parallel mode without conveying the respective parameter values of the expert sub-models among the plurality of processing devices.

18054446. MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES simplified abstract (Microsoft Technology Licensing, LLC)

Contents

MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES

Organization Name

Inventor(s)