TEXT-BASED IMAGE GENERATION USING AN IMAGE-TRAINED TEXT

Organization Name

ADOBE INC.

Inventor(s)

Tobias Hinz of Campbell CA (US)

Ali Aminian of Piedmont CA (US)

Hao Tan of Santa Clara CA (US)

Kushal Kafle of Sunnyvale CA (US)

Oliver Wang of Seattle WA (US)

Jingwan Lu of Sunnyvale CA (US)

TEXT-BASED IMAGE GENERATION USING AN IMAGE-TRAINED TEXT - A simplified explanation of the abstract

This abstract first appeared for US patent application 18439036 titled 'TEXT-BASED IMAGE GENERATION USING AN IMAGE-TRAINED TEXT

The patent application describes a method, apparatus, and system for generating images based on text prompts.

The technology involves using a text encoder jointly trained with an image generation model to obtain a text embedding from a text prompt.
The image generation model then generates a synthetic image based on the obtained text embedding.

Potential Applications:

This technology can be used in the field of computer graphics for generating images based on textual descriptions.
It can be applied in content creation tools for artists and designers to quickly visualize their ideas.

Problems Solved:

Enables the generation of images from text prompts, bridging the gap between textual descriptions and visual representations.
Streamlines the process of creating visual content based on written instructions.

Benefits:

Facilitates rapid prototyping and visualization of concepts.
Enhances creativity and ideation by providing a visual output for text-based inputs.

Commercial Applications:

"Image Generation from Text Prompts" technology can be utilized in graphic design software, virtual reality applications, and e-commerce platforms for creating custom visual content.

Questions about Image Generation from Text Prompts: 1. How does the text encoder work in conjunction with the image generation model to create synthetic images? 2. What are the potential limitations of generating images solely based on text prompts?

Frequently Updated Research: Ongoing research in the field of natural language processing and computer vision may lead to advancements in the accuracy and realism of images generated from text prompts.

Original Abstract Submitted

A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a text prompt and encoding, using a text encoder jointly trained with an image generation model, the text prompt to obtain a text embedding. Some embodiments generate, using the image generation model, a synthetic image based on the text embedding.