Jump to content

18730671. PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL (Tsinghua University)

From WikiPatents

PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL

Organization Name

Tsinghua University

Inventor(s)

Jidong Zhai of Beijing City CN

Jia'ao He of Beijing City CN

PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL

This abstract first appeared for US patent application 18730671 titled 'PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL

Original Abstract Submitted

The present disclosure provides a performance optimization method and apparatus for training mixture-of-experts model, which relate to the technical field of neural networks. The method includes: judging, before one iterative calculation and for each of all experts in a mixture-of-experts model, whether a current expert needs to be set as a shadow expert, and if yes, adding the current expert to a shadow expert set, and continuing to judging whether a next expert is set as a shadow expert until all the experts are judged. The present disclosure is capable of improving the speed and efficiency of training the mixture-of-experts model, and reduce the resources consumed in the mixture-of-experts model during training.