18054452. COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER simplified abstract (Microsoft Technology Licensing, LLC)
COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER
Organization Name
Microsoft Technology Licensing, LLC
Inventor(s)
Changho Hwang of Cheongju-si (KR)
Rafael Omar Salas of Tega Cay SC (US)
Prabhat Ram of Los Altos CA (US)
Ho-Yuen Chau of Redmond WA (US)
Yongqiang Xiong of Beijing (CN)
COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER - A simplified explanation of the abstract
This abstract first appeared for US patent application 18054452 titled 'COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER
Simplified Explanation
The computing system described in the abstract includes processing devices that execute a Mixture-of-Experts (MoE) layer in an MoE model. The MoE layer involves splitting input tensors, processing them at expert sub-models, and concatenating the outputs to obtain final output tensors.
- The computing system includes multiple processing devices.
- The processing devices execute the MoE layer by splitting input tensors during a collective communication phase.
- The split tensors are processed by expert sub-models to obtain new input tensors.
- During a second communication phase, the processed tensors are concatenated to obtain final output tensors.
Potential Applications
This technology can be applied in various fields such as natural language processing, image recognition, and recommendation systems.
Problems Solved
This technology helps in improving the performance and efficiency of deep learning models by utilizing a Mixture-of-Experts approach.
Benefits
The benefits of this technology include enhanced model accuracy, faster computation, and better utilization of resources.
Potential Commercial Applications
Potential commercial applications of this technology include improving search engines, personalized recommendations, and speech recognition systems.
Possible Prior Art
One possible prior art for this technology is the use of ensemble models in machine learning to combine multiple models for better performance.
Unanswered Questions
How does the system handle communication between processing devices efficiently?
The abstract mentions collective communication phases, but it does not provide details on the specific mechanisms used for communication optimization.
What is the impact of the MoE layer on model interpretability?
While the abstract focuses on the technical aspects of the MoE layer, it does not address how the use of this layer may affect the interpretability of the overall model.
Original Abstract Submitted
A computing system including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The processing devices are configured to execute the MoE layer at least in part by, during a first collective communication phase between the processing devices, splitting each of a plurality of first input tensors along a first dimension to obtain first output tensors. Executing the MoE layer further includes processing the first output tensors at a respective a plurality of expert sub-models to obtain a plurality of second input tensors. Executing the MoE layer further includes, during a second collective communication phase between the processing devices, receiving the second input tensors from the expert sub-models and concatenating the second input tensors along the first dimension to obtain second output tensors. Executing the MoE layer further includes outputting the second output tensors as output of the MoE layer.
- Microsoft Technology Licensing, LLC
- Yifan Xiong of Beijing (CN)
- Changho Hwang of Cheongju-si (KR)
- Wei Cui of Beijing (CN)
- Ziyue Yang of Beijing (CN)
- Ze Liu of Beijing (CN)
- Han Hu of Beijing (CN)
- Zilong Wang of Beijing (CN)
- Rafael Omar Salas of Tega Cay SC (US)
- Jithin Jose of Austin TX (US)
- Prabhat Ram of Los Altos CA (US)
- Ho-Yuen Chau of Redmond WA (US)
- Peng Cheng of Beijing (CN)
- Fan Yang of Beijing (CN)
- Mao Yang of Beijing (CN)
- Yongqiang Xiong of Beijing (CN)
- G06N3/063