18387768. HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT (GOOGLE LLC)
HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT
Organization Name
Inventor(s)
Matthew Sharifi of Kilchberg CH
HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT
This abstract first appeared for US patent application 18387768 titled 'HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT
Original Abstract Submitted
Implementations utilize a hybrid use of a smaller LLM and a larger LLM to generate and refine content responsive to a user query/request for content generation. In various implementations, the smaller LLM is utilized to process the user query for content generation, to generate initial content responsive to the user query for content generation. The user query for content generation and the initial content can be utilized to generate a text prompt, where the text prompt can be configured to further include a request for focused edit(s). Such a text prompt can be processed using the larger LLM, to generate focused edit(s) to the initial content that refine the initiated content, so that revised content (with improved accuracy) responsive to the user query for content generation is acquired.