Google llc (20240212246). IMAGE MANIPULATION BY TEXT INSTRUCTION simplified abstract

From WikiPatents
Jump to navigation Jump to search

IMAGE MANIPULATION BY TEXT INSTRUCTION

Organization Name

google llc

Inventor(s)

Tianhao Zhang of Sunnyvale CA (US)

Weilong Yang of Freemont CA (US)

Honglak Lee of Ann Arbor MI (US)

Hung-Yu Tseng of Merced CA (US)

Irfan Aziz Essa of Atlanta GA (US)

Lu Jiang of Mountain View CA (US)

IMAGE MANIPULATION BY TEXT INSTRUCTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240212246 titled 'IMAGE MANIPULATION BY TEXT INSTRUCTION

The abstract describes a method for generating an output image from an input image and an input text instruction using a neural network. The neural network includes an image encoder, an image decoder, and an instruction attention network.

  • The method involves receiving the input image and text instruction, extracting features from the input image using the image encoder, generating spatial and modification features from the text instruction using the instruction attention network, and creating an edited image feature from the input image feature, spatial feature, and modification feature.
  • The output image is then generated from the edited image feature using the image decoder.
  • This technology allows for precise editing of images based on text instructions, providing a new way to interact with and manipulate visual content.
  • The method can be applied in various fields such as graphic design, photo editing, and digital art creation.
  • It solves the problem of accurately translating text instructions into visual edits on images, streamlining the editing process.
  • The benefits include increased efficiency, accuracy, and creativity in image editing tasks.
  • Commercial Applications: This technology can be utilized in software applications for image editing, online design tools, and automated graphic design systems, enhancing user experience and productivity.
  • Questions about the technology:
   1. How does this method improve upon traditional image editing techniques?
   2. What are the potential limitations of using text instructions for image editing tasks?


Original Abstract Submitted

a method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. the neural network includes an image encoder, an image decoder, and an instruction attention network. the method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.