ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

Organization Name

SAMSUNG ELECTRONICS CO., LTD.

Inventor(s)

Shikhar Tuli of Sunnyvale CA US

Chi-Heng Lin of Mountain View CA US

Yen-Chang Hsu of Fremont CA US

Yilin Shen of Mountain View CA US

Hongxia Jin of San Jose CA US

ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

This abstract first appeared for US patent application 18827108 titled 'ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

Original Abstract Submitted

A method for performing multi-token prediction by an apparatus includes receiving, from an artificial intelligence (AI) assistance device, a request for an output token sequence that is subsequent to an input token sequence indicated by the request, predicting, by a trained machine learning model, a plurality of candidate output tokens, estimating joint probability distributions of one or more combinations of the plurality of candidate output tokens, calculating joint probabilities of the one or more combinations by masking the joint probability distributions with a co-occurrence weighted mask, determining, based on the joint probabilities, whether to reduce the number of candidate output tokens included in each combination of the one or more combinations, identifying, based on the joint probabilities, a combination of the one or more combinations as the output token sequence, and outputting, to the AI assistance device, a response to the request, the response comprising the output token sequence.

18827108. ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING (SAMSUNG ELECTRONICS CO., LTD.)

ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

Organization Name

Inventor(s)

ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

Original Abstract Submitted

Unlock Your AI Advantage