HARDWARE-AWARE EFFICIENT ARCHITECTURES FOR TEXT-TO-IMAGE DIFFUSION MODELS

Organization Name

qualcomm incorporated

Inventor(s)

Shubhankar Mangesh Borse of San Diego CA US

Risheek Garrepalli of San Diego CA US

Qiqi Hou of San Diego CA US

Jisoo Jeong of San Diego CA US

Shreya Kadambi of San Diego CA US

Munawar Hayat of San Diego CA US

Fatih Murat Porikli of San Diego CA US

HARDWARE-AWARE EFFICIENT ARCHITECTURES FOR TEXT-TO-IMAGE DIFFUSION MODELS

This abstract first appeared for US patent application 20250131606 titled 'HARDWARE-AWARE EFFICIENT ARCHITECTURES FOR TEXT-TO-IMAGE DIFFUSION MODELS

Original Abstract Submitted

a processor-implemented method includes receiving a text-semantic input at a first stage of a neural network, including a first convolutional block and no attention layers. the method receives, at a second stage, a first output from the first stage. the second stage comprises a first down sampling block including a first attention layer and a second convolutional block. the method receives, at a third stage, a second output from the second stage. the third stage comprises a first up sampling block including a second attention layer and a first set of convolutional blocks. the method receives, at a fourth stage, the first output from the first stage and a third output from the third stage. the fourth stage comprises a second up sampling block including no attention layers and a second set of convolutional blocks. the method generates an image at the fourth stage, based on the text-semantic input.