QUALCOMM Incorporated (20250021761). ACCELERATING INFERENCING IN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
ACCELERATING INFERENCING IN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
Organization Name
Inventor(s)
Arvind Vardarajan Santhanam of San Diego CA US
Joseph Binamira Soriaga of San Diego CA US
Roland Memisevic of Toronto CA
Christopher Lott of San Diego CA US
ACCELERATING INFERENCING IN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
This abstract first appeared for US patent application 20250021761 titled 'ACCELERATING INFERENCING IN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
Original Abstract Submitted
techniques and apparatus for generating a response to a query input into a generative artificial intelligence model. an example method generally includes generating, based on an input query and a first generative artificial intelligence model, a sequence of tokens corresponding to a candidate response to the input query. the sequence of tokens and the input query are output to a second generative artificial intelligence model for verification. one or more first guidance signals for the generated sequence of tokens are received from the second generative artificial intelligence model. the candidate response to the input query is revised based on the generated sequence of tokens and the one or more first guidance signals, and the revised candidate response is output as a response to the received input query.