18054446. MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES simplified abstract (Microsoft Technology Licensing, LLC)

From WikiPatents
Revision as of 02:42, 30 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Yifan Xiong of Beijing (CN)

Changho Hwang of Cheongju-si (KR)

Wei Cui of Beijing (CN)

Ziyue Yang of Beijing (CN)

Ze Liu of Beijing (CN)

Han Hu of Beijing (CN)

Zilong Wang of Beijing (CN)

Rafael Omar Salas of Tega Cay SC (US)

Jithin Jose of Austin TX (US)

Prabhat Ram of Los Altos CA (US)

Ho-Yuen Chau of Redmond WA (US)

Peng Cheng of Beijing (CN)

Fan Yang of Beijing (CN)

Mao Yang of Beijing (CN)

Yongqiang Xiong of Beijing (CN)

MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES - A simplified explanation of the abstract

This abstract first appeared for US patent application 18054446 titled 'MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES

Simplified Explanation

The patent application describes a computing system with a Mixture-of-Experts (MoE) layer that can switch between different modes without sharing expert sub-model parameters among processing devices.

  • The computing system includes multiple processing devices.
  • The MoE layer in the system contains multiple expert sub-models with their own parameter values.
  • The MoE layer can switch between data parallel mode and expert-data-model parallel mode.
  • The switch between modes does not involve sharing parameter values among processing devices.

Potential Applications

The technology described in this patent application could be applied in various fields such as:

  • Machine learning
  • Artificial intelligence
  • Data analysis

Problems Solved

This technology addresses the following issues:

  • Efficient utilization of processing devices
  • Enhanced model performance
  • Improved scalability of MoE models

Benefits

The benefits of this technology include:

  • Increased flexibility in model training
  • Better resource management
  • Enhanced model accuracy

Potential Commercial Applications

A potential commercial application for this technology could be in:

  • Cloud computing services
  • Data centers
  • Autonomous systems

Possible Prior Art

One possible prior art for this technology could be the use of distributed computing systems in machine learning models to improve performance and scalability.

Unanswered Questions

How does the switching between modes impact the overall performance of the MoE model?

The article does not provide specific details on how the switching between modes affects the performance of the MoE model.

Are there any limitations to the scalability of the MoE layer in this computing system?

The article does not mention any potential limitations to the scalability of the MoE layer in the described computing system.


Original Abstract Submitted

A computing system is provided, including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The MoE layer includes a plurality of expert sub-models that each have a respective plurality of parameter values. The MoE layer is configured to be switchable between a data parallel mode and an expert-data-model parallel mode without conveying the respective parameter values of the expert sub-models among the plurality of processing devices.