Robert bosch gmbh (20250104394). SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS
SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS
Organization Name
Inventor(s)
XINGYU Li of New Orleans LA US
CHAITHANYA KUMAR Mummadi of Pittsburgh PA US
MADAN RAVI Ganesh of Pittsburgh PA US
SABRINA Schmedding of Tiefenbronn DE
SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS
This abstract first appeared for US patent application 20250104394 titled 'SCALABLE PROMPT LEARNING FOR LARGE VISION-LANGUAGE MODELS
Original Abstract Submitted
a method of generating text-driven prompts and class prediction probabilities using a vision-language model (vlm) includes receiving candidate class names associated with a plurality of candidate classes for images, generating class text tokens based on a text description of the candidate class names, and generating a plurality of context prompt vectors using a prompt generator. the context prompt vectors define context information associated with an image classification task to be performed by the vlm. the method further includes generating prompts for each of the plurality of candidate classes by appending respective class text tokens to the context prompt vectors for each of the plurality of candidate classes, and, using the vlm, generating and outputting a class prediction probability for a sample image based on the plurality of context prompt vectors.