HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT

Organization Name

Inventor(s)

HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT

This abstract first appeared for US patent application 18387768 titled 'HYBRID INFERENCE FOR AN EFFICIENT, LOW LATENCY LLM-BASED ASSISTANT

Original Abstract Submitted

Implementations utilize a hybrid use of a smaller LLM and a larger LLM to generate and refine content responsive to a user query/request for content generation. In various implementations, the smaller LLM is utilized to process the user query for content generation, to generate initial content responsive to the user query for content generation. The user query for content generation and the initial content can be utilized to generate a text prompt, where the text prompt can be configured to further include a request for focused edit(s). Such a text prompt can be processed using the larger LLM, to generate focused edit(s) to the initial content that refine the initiated content, so that revised content (with improved accuracy) responsive to the user query for content generation is acquired.