17971169. GENERATIVE MODEL FOR MULTI-MODALITY OUTPUTS FROM A SINGLE INPUT simplified abstract (Adobe Inc.)

From WikiPatents
Jump to navigation Jump to search

GENERATIVE MODEL FOR MULTI-MODALITY OUTPUTS FROM A SINGLE INPUT

Organization Name

Adobe Inc.

Inventor(s)

Yijun Li of Seattle WA (US)

Zhixin Shu of San Jose CA (US)

Zhen Zhu of Urbana IL (US)

Krishna Kumar Singh of San Jose CA (US)

GENERATIVE MODEL FOR MULTI-MODALITY OUTPUTS FROM A SINGLE INPUT - A simplified explanation of the abstract

This abstract first appeared for US patent application 17971169 titled 'GENERATIVE MODEL FOR MULTI-MODALITY OUTPUTS FROM A SINGLE INPUT

The abstract describes an image generation system that utilizes a multi-branch GAN to create images with visually similar content expressed in different modalities. The system includes a generator with multiple branches responsible for generating different modalities, and a discriminator with fidelity discriminators for each generator branch and a consistency discriminator to ensure visual similarity among the outputs. During training, a non-saturating GAN loss is computed using outputs from the discriminators to refine the system's parameters until convergence. The trained multi-branch GAN can produce multiple images from a single input, each depicting visually similar content in different modalities.

  • The system implements a multi-branch GAN for image generation with different modalities.
  • The generator has multiple branches for generating various modalities.
  • The discriminator includes fidelity discriminators and a consistency discriminator to ensure visual similarity among outputs.
  • Training involves computing a non-saturating GAN loss using outputs from the discriminators.
  • The system refines parameters until convergence to generate multiple images with visually similar content in different modalities.

Potential Applications: - Artistic image generation - Content creation for multimedia projects - Style transfer in photography and design

Problems Solved: - Generating diverse images with similar content - Enhancing creativity in image generation - Improving visual consistency in multi-modal outputs

Benefits: - Efficient creation of diverse images - Enhanced artistic expression - Consistent visual style across different modalities

Commercial Applications: Title: Multi-Modal Image Generation System for Creative Industries This technology can be used in: - Advertising agencies for creating visually appealing content - Entertainment industry for generating unique multimedia assets - Design studios for innovative visual projects

Prior Art: No prior art information available at this time.

Frequently Updated Research: No frequently updated research information available at this time.

Questions about Multi-Modal Image Generation System: Question 1: How does the system ensure visual consistency among the outputs of different modalities? Answer: The system uses fidelity discriminators and a consistency discriminator to constrain the generated outputs to appear visually similar to each other.

Question 2: What are the potential challenges in training a multi-branch GAN for image generation with different modalities? Answer: Some challenges may include optimizing the parameters for each branch, ensuring convergence of the system, and balancing the generation of diverse images with visual consistency.


Original Abstract Submitted

An image generation system implements a multi-branch GAN to generate images that each express visually similar content in a different modality. A generator portion of the multi-branch GAN includes multiple branches that are each tasked with generating one of the different modalities. A discriminator portion of the multi-branch GAN includes multiple fidelity discriminators, one for each of the generator branches, and a consistency discriminator, which constrains the outputs generated by the different generator branches to appear visually similar to one another. During training, outputs from each of the fidelity discriminators and the consistency discriminator are used to compute a non-saturating GAN loss. The non-saturating GAN loss is used to refine parameters of the multi-branch GAN during training until model convergence. The trained multi-branch GAN generates multiple images from a single input, where each of the multiple images depicts visually similar content expressed in a different modality.