18054452. COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER simplified abstract (Microsoft Technology Licensing, LLC)

From WikiPatents
Revision as of 02:42, 30 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Yifan Xiong of Beijing (CN)

Changho Hwang of Cheongju-si (KR)

Wei Cui of Beijing (CN)

Ziyue Yang of Beijing (CN)

Ze Liu of Beijing (CN)

Han Hu of Beijing (CN)

Zilong Wang of Beijing (CN)

Rafael Omar Salas of Tega Cay SC (US)

Jithin Jose of Austin TX (US)

Prabhat Ram of Los Altos CA (US)

Ho-Yuen Chau of Redmond WA (US)

Peng Cheng of Beijing (CN)

Fan Yang of Beijing (CN)

Mao Yang of Beijing (CN)

Yongqiang Xiong of Beijing (CN)

COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER - A simplified explanation of the abstract

This abstract first appeared for US patent application 18054452 titled 'COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER

Simplified Explanation

The computing system described in the abstract includes processing devices that execute a Mixture-of-Experts (MoE) layer in an MoE model. The MoE layer involves splitting input tensors, processing them at expert sub-models, and concatenating the outputs to obtain final output tensors.

  • The computing system includes multiple processing devices.
  • The processing devices execute the MoE layer by splitting input tensors during a collective communication phase.
  • The split tensors are processed by expert sub-models to obtain new input tensors.
  • During a second communication phase, the processed tensors are concatenated to obtain final output tensors.

Potential Applications

This technology can be applied in various fields such as natural language processing, image recognition, and recommendation systems.

Problems Solved

This technology helps in improving the performance and efficiency of deep learning models by utilizing a Mixture-of-Experts approach.

Benefits

The benefits of this technology include enhanced model accuracy, faster computation, and better utilization of resources.

Potential Commercial Applications

Potential commercial applications of this technology include improving search engines, personalized recommendations, and speech recognition systems.

Possible Prior Art

One possible prior art for this technology is the use of ensemble models in machine learning to combine multiple models for better performance.

Unanswered Questions

How does the system handle communication between processing devices efficiently?

The abstract mentions collective communication phases, but it does not provide details on the specific mechanisms used for communication optimization.

What is the impact of the MoE layer on model interpretability?

While the abstract focuses on the technical aspects of the MoE layer, it does not address how the use of this layer may affect the interpretability of the overall model.


Original Abstract Submitted

A computing system including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The processing devices are configured to execute the MoE layer at least in part by, during a first collective communication phase between the processing devices, splitting each of a plurality of first input tensors along a first dimension to obtain first output tensors. Executing the MoE layer further includes processing the first output tensors at a respective a plurality of expert sub-models to obtain a plurality of second input tensors. Executing the MoE layer further includes, during a second collective communication phase between the processing devices, receiving the second input tensors from the expert sub-models and concatenating the second input tensors along the first dimension to obtain second output tensors. Executing the MoE layer further includes outputting the second output tensors as output of the MoE layer.