18730671. PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL (Tsinghua University)
PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL
Organization Name
Inventor(s)
Jidong Zhai of Beijing City CN
PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL
This abstract first appeared for US patent application 18730671 titled 'PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL
Original Abstract Submitted
The present disclosure provides a performance optimization method and apparatus for training mixture-of-experts model, which relate to the technical field of neural networks. The method includes: judging, before one iterative calculation and for each of all experts in a mixture-of-experts model, whether a current expert needs to be set as a shadow expert, and if yes, adding the current expert to a shadow expert set, and continuing to judging whether a next expert is set as a shadow expert until all the experts are judged. The present disclosure is capable of improving the speed and efficiency of training the mixture-of-experts model, and reduce the resources consumed in the mixture-of-experts model during training.