MODALITY SPECIFIC LEARNABLE ATTENTION FOR MULTI-CONDITIONED DIFFUSION MODELS

Organization Name

Inventor(s)

Aashish Kumar Misraa of Santa Clara CA US

Ajinkya Gorakhnath Kale of San Jose CA US

MODALITY SPECIFIC LEARNABLE ATTENTION FOR MULTI-CONDITIONED DIFFUSION MODELS

This abstract first appeared for US patent application 20250117972 titled 'MODALITY SPECIFIC LEARNABLE ATTENTION FOR MULTI-CONDITIONED DIFFUSION MODELS

Original Abstract Submitted

a method, apparatus, non-transitory computer readable medium, and system for image generation include encoding a text prompt to obtain a text embedding. an image prompt is encoded to obtain an image embedding. cross-attention is performed on the text embedding and then on the image embedding to obtain a text attention output and an image attention output, respectively. a synthesized image is generated based on the text attention output and the image attention output.