18400629. IMAGE MANIPULATION BY TEXT INSTRUCTION simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

IMAGE MANIPULATION BY TEXT INSTRUCTION

Organization Name

Google LLC

Inventor(s)

Tianhao Zhang of Sunnyvale CA (US)

Weilong Yang of Freemont CA (US)

Honglak Lee of Ann Arbor MI (US)

Hung-Yu Tseng of Merced CA (US)

Irfan Aziz Essa of Atlanta GA (US)

Lu Jiang of Mountain View CA (US)

IMAGE MANIPULATION BY TEXT INSTRUCTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18400629 titled 'IMAGE MANIPULATION BY TEXT INSTRUCTION

The abstract describes a method for generating an output image from an input image and an input text instruction using a neural network.

  • The neural network includes an image encoder, an image decoder, and an instruction attention network.
  • The method involves receiving the input image and text instruction, extracting features from the input image, generating spatial and modification features from the text instruction, and creating an edited image feature.
  • The output image is then generated from the edited image feature using the image decoder.

Potential Applications: - Image editing software - Augmented reality applications - Virtual reality applications

Problems Solved: - Streamlining the image editing process - Enhancing user experience in editing images - Improving the efficiency of generating edited images

Benefits: - Faster image editing - More precise modifications - Enhanced creativity in image editing

Commercial Applications: Title: Advanced Image Editing Technology for Enhanced User Experience This technology can be used in various industries such as graphic design, advertising, and entertainment for efficient and creative image editing processes.

Questions about the technology: 1. How does this technology improve the image editing process compared to traditional methods? 2. What are the potential limitations of using a neural network for generating edited images?


Original Abstract Submitted

A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.