Jump to content

18730671. PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL (Tsinghua University)

From WikiPatents
Revision as of 07:29, 31 March 2025 by Unknown user (talk) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL

Organization Name

Tsinghua University

Inventor(s)

Jidong Zhai of Beijing City CN

Jia'ao He of Beijing City CN

PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL

This abstract first appeared for US patent application 18730671 titled 'PERFORMANCE OPTIMIZATION METHOD AND APPARATUS FOR TRAINING MIXTURE-OF-EXPERTS MODEL

Original Abstract Submitted

The present disclosure provides a performance optimization method and apparatus for training mixture-of-experts model, which relate to the technical field of neural networks. The method includes: judging, before one iterative calculation and for each of all experts in a mixture-of-experts model, whether a current expert needs to be set as a shadow expert, and if yes, adding the current expert to a shadow expert set, and continuing to judging whether a next expert is set as a shadow expert until all the experts are judged. The present disclosure is capable of improving the speed and efficiency of training the mixture-of-experts model, and reduce the resources consumed in the mixture-of-experts model during training.

(Ad) Transform your business with AI in minutes, not months

Custom AI strategy for your specific industry
Step-by-step implementation with clear ROI
5-minute setup - no technical skills needed
Get your AI playbook
Cookies help us deliver our services. By using our services, you agree to our use of cookies.