SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS

Organization Name

robert bosch gmbh

Inventor(s)

CHEN Qiu of Pittsburgh PA US

XINGYU Li of New Orleans LA US

CHAITHANYA KUMAR Mummadi of Pittsburgh PA US

MADAN RAVI Ganesh of Pittsburgh PA US

ZHENZHEN Li of Gibsonia PA US

WAN-YI Lin of Wexford PA US

SABRINA Schmedding of Tiefenbronn DE

SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS

This abstract first appeared for US patent application 20250104394 titled 'SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS

Original Abstract Submitted

a method of generating text-driven prompts and class prediction probabilities using a vision-language model (vlm) includes receiving candidate class names associated with a plurality of candidate classes for images, generating class text tokens based on a text description of the candidate class names, and generating a plurality of context prompt vectors using a prompt generator. the context prompt vectors define context information associated with an image classification task to be performed by the vlm. the method further includes generating prompts for each of the plurality of candidate classes by appending respective class text tokens to the context prompt vectors for each of the plurality of candidate classes, and, using the vlm, generating and outputting a class prediction probability for a sample image based on the plurality of context prompt vectors.