18372661. EFFICIENT TRANSFORMER TRAINING BASED ON SMALLER PRETRAINED MODELS (Massachusetts Institute of Technology)
EFFICIENT TRANSFORMER TRAINING BASED ON SMALLER PRETRAINED MODELS
Organization Name
Massachusetts Institute of Technology
Inventor(s)
Rameswar Panda of Medford MA US
LEONID Karlinsky of Acton MA US
Rogerio Schmidt Feris of West Hartford CT US
Yoon Hyung Kim of Cambridge MA US
EFFICIENT TRANSFORMER TRAINING BASED ON SMALLER PRETRAINED MODELS
This abstract first appeared for US patent application 18372661 titled 'EFFICIENT TRANSFORMER TRAINING BASED ON SMALLER PRETRAINED MODELS
Original Abstract Submitted
Parameters of a first transformer are accessed, and size dimensions of a second transformer that is to be trained and is larger than the first transformer are received. The parameters of the first transformer are linearly transformed using a combination of a width-growth operator and a depth-growth operator, wherein the linear transformation produces a set of new parameters, the set corresponding to the size dimensions of the second transformer. The second transformer is initialized with the set of new parameters.