18452424. AUTOMATIC IMAGE SELECTION WITH CROSS MODAL MATCHING simplified abstract (Apple Inc.)

From WikiPatents
Jump to navigation Jump to search

AUTOMATIC IMAGE SELECTION WITH CROSS MODAL MATCHING

Organization Name

Apple Inc.

Inventor(s)

Jia Huang of Mountain View CA (US)

Robert J. Monarch of San Francisco CA (US)

Alex Jungho Kim of Mukilteo WA (US)

Jungsuk Kwac of Palo Alto CA (US)

Parmeshwar Khurd of San Jose CA (US)

Kailash Thiyagarajan of Dallas TX (US)

Xiaoyuan Goodman Gu of San Jose CA (US)

AUTOMATIC IMAGE SELECTION WITH CROSS MODAL MATCHING - A simplified explanation of the abstract

This abstract first appeared for US patent application 18452424 titled 'AUTOMATIC IMAGE SELECTION WITH CROSS MODAL MATCHING

The present technology involves a multi-modal transformer model trained for cross-modal tasks like image-text matching, refined with data for specific downstream use cases.

  • The model is designed to perform tasks such as image-text matching.
  • It is refined with labeled examples derived from a dataset of text-image pairs.
  • The technology achieves a desired interaction in the proper context.
  • It can be used in advertising applications in an App store.
  • The model can be refined with examples of images used in App store advertisements.
    • Potential Applications:**

This technology can be applied in various fields such as advertising, e-commerce, and content recommendation systems.

    • Problems Solved:**

The technology addresses the challenge of matching images with text effectively and efficiently in various applications.

    • Benefits:**

The technology improves the accuracy and performance of cross-modal tasks like image-text matching, leading to better user engagement and conversion rates.

    • Commercial Applications:**

This technology can be utilized in digital marketing, e-commerce platforms, and online content recommendation systems to enhance user experience and drive conversions.

    • Prior Art:**

Researchers can explore prior art related to multi-modal transformer models, cross-modal tasks, and image-text matching in the field of natural language processing and computer vision.

    • Frequently Updated Research:**

Stay updated on the latest advancements in multi-modal transformer models, cross-modal tasks, and image-text matching in the fields of artificial intelligence and machine learning.

    • Questions about the Technology:**

1. How does this technology improve the performance of image-text matching tasks?

  - This technology enhances the accuracy and efficiency of matching images with text in various applications, leading to better user engagement and conversion rates.

2. What are the potential commercial applications of this technology?

  - This technology can be applied in digital marketing, e-commerce platforms, and content recommendation systems to improve user experience and drive conversions.


Original Abstract Submitted

The present technology pertains to a multi-modal transformer model that is designed and trained to perform cross-modal tasks such as image-text matching, wherein the model is further refined with data for the particular downstream use case of the model. More specifically, the present technology can refine the underlying model with labeled examples derived from a dataset of text-image pairs that ultimately achieved a desired interaction in the proper context. For example, in the use case of advertising applications in an App store, the present technology can refine the underlying model with examples of images used to advertise applications in the App store where the respective invitational content was clicked or converted.