18827108. ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING (SAMSUNG ELECTRONICS CO., LTD.)
ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING
Organization Name
Inventor(s)
Shikhar Tuli of Sunnyvale CA US
Chi-Heng Lin of Mountain View CA US
Yen-Chang Hsu of Fremont CA US
Yilin Shen of Mountain View CA US
ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING
This abstract first appeared for US patent application 18827108 titled 'ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING
Original Abstract Submitted
A method for performing multi-token prediction by an apparatus includes receiving, from an artificial intelligence (AI) assistance device, a request for an output token sequence that is subsequent to an input token sequence indicated by the request, predicting, by a trained machine learning model, a plurality of candidate output tokens, estimating joint probability distributions of one or more combinations of the plurality of candidate output tokens, calculating joint probabilities of the one or more combinations by masking the joint probability distributions with a co-occurrence weighted mask, determining, based on the joint probabilities, whether to reduce the number of candidate output tokens included in each combination of the one or more combinations, identifying, based on the joint probabilities, a combination of the one or more combinations as the output token sequence, and outputting, to the AI assistance device, a response to the request, the response comprising the output token sequence.