20250190698. Method Training Language Model (S2W .)
METHOD OF TRAINING LANGUAGE MODEL FOR CYBERSECURITY AND SYSTEM PERFORMING THE SAME
Abstract: provided is a system for training a language model for cybersecurity, which includes: a document collection unit that collects a cybersecurity document used for training a language model for cybersecurity; an extraction unit that identifies non-linguistic elements in the cybersecurity document based on a non-linguistic element database; a tokenization unit that tokenizes the cybersecurity document to generate a plurality of tokens; and a language model application unit that controls the language model to simultaneously perform a first task of classifying types of the non-linguistic elements including at least one of a bitcoin address, a hash value, an ip address, and a vulnerability identifier included in the cybersecurity document and a second task of recovering only linguistic elements of the cybersecurity document.
Inventor(s): Seung Won SHIN, Young Jin JIN, Eu Gene JANG, Da Yeon YIM, Jin Woo CHUNG, Yong Jae LEE, Jian CUI, Chang Hoon YOON, Seung Yong YANG
CPC Classification: G06F40/211 (ELECTRIC DIGITAL DATA PROCESSING (computer systems based on specific computational models ))
Search for rejections for patent application number 20250190698