20250217604. Efficient Ef (SAMSUNG ELECTRONICS ., .)
EFFICIENT AND EFFECTIVE SYSTEM AND METHOD TO BUILD MULTI-LINGUAL LARGE LANGUAGE MODELS
Abstract: a method of enabling a language model trained in a first language to support a second language includes extending an existing vocabulary of the language model to include additional tokens for text in the second language. the method also includes initializing the additional tokens for text in the second language based on subtokens of tokens for text in the first language from the existing vocabulary. the method further includes training the language model using a mixed language dataset that includes a first language corpus and a second language corpus. in addition, the method includes performing instruction tuning using a dataset that includes (i) instruction and response pairs involving the first language and (ii) instruction and response pairs involving the second language.
Inventor(s): Hai Wang, Zheng Tang, Vijay Srinivasan, Hyuk Joon Kwon, Vikas Yadav, Feixuan Wang, Hongxia Jin
CPC Classification: G06F40/55 (ELECTRIC DIGITAL DATA PROCESSING (computer systems based on specific computational models ))
Search for rejections for patent application number 20250217604