Samsung electronics co., ltd. (20240203143). PROMPT TUNING FOR ZERO-SHOT COMPOSITIONAL LEARNING IN MACHINE LEARNING SYSTEMS simplified abstract

From WikiPatents
Jump to navigation Jump to search

PROMPT TUNING FOR ZERO-SHOT COMPOSITIONAL LEARNING IN MACHINE LEARNING SYSTEMS

Organization Name

samsung electronics co., ltd.

Inventor(s)

Lingyu Zhang of Cupertino CA (US)

Ting Hua of Santa Clara CA (US)

Yilin Shen of Santa Clara CA (US)

Hongxia Jin of San Jose CA (US)

PROMPT TUNING FOR ZERO-SHOT COMPOSITIONAL LEARNING IN MACHINE LEARNING SYSTEMS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240203143 titled 'PROMPT TUNING FOR ZERO-SHOT COMPOSITIONAL LEARNING IN MACHINE LEARNING SYSTEMS

The method described in the abstract involves prompt tuning of a pre-trained vision-language model to select attribute and object labels that match content in an image.

  • The model is trained to choose one attribute label and one object label for each image.
  • Prompt tuning involves generating textual features for objects and attributes using textual encoders, and image features using a vision encoder.
  • Layer-specific learnable prompt tokens are generated and appended to inputs of specified layers in the encoders and vision encoder.

Potential Applications: - Image recognition and labeling systems - Visual search engines - Content-based image retrieval systems

Problems Solved: - Improving accuracy in matching attribute and object labels to image content - Enhancing the performance of vision-language models

Benefits: - Increased precision in image labeling - Enhanced user experience in visual search applications - Improved efficiency in content retrieval systems

Commercial Applications: Title: Enhanced Image Labeling Technology for Visual Search Engines This technology can be utilized in e-commerce platforms for better product categorization and recommendation systems. It can also be integrated into social media platforms for improved image tagging and search functionalities.

Questions about the technology: 1. How does prompt tuning improve the performance of vision-language models? 2. What are the potential challenges in implementing this technology in real-world applications?

Frequently Updated Research: Researchers are continuously exploring ways to optimize prompt tuning techniques for vision-language models to further enhance their accuracy and efficiency. Stay updated on recent advancements in this field for the latest developments in image recognition technology.


Original Abstract Submitted

a method includes obtaining an image, a set of attribute labels, and a set of object labels and performing prompt tuning of a pre-trained vision-language model having first and second textual encoders and a vision encoder. the model is trained during prompt tuning to select one attribute label and one object label that match content in the image. performing the prompt tuning includes, for each attribute label-object label pair, generating object textual features associated with the object label using the first textual encoder, generating attribute textual features associated with the attribute label using the second textual encoder, and generating image features associated with the image using the vision encoder. intermediate outputs from initial layers of the textual encoders and the vision encoder are combined to generate layer-specific learnable prompt tokens that are appended to inputs of specified layers in the first and second textual encoders and the vision encoder.