18228614. PROMPT-TO-PROMPT IMAGE EDITING WITH CROSS-ATTENTION CONTROL simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

PROMPT-TO-PROMPT IMAGE EDITING WITH CROSS-ATTENTION CONTROL

Organization Name

GOOGLE LLC

Inventor(s)

Kfir Aberman of San Mateo CA (US)

Amir Hertz of Tel Aviv (IL)

Yael Pritch Knaan of Tel Aviv (IL)

Ron Mokady of Tel Aviv (IL)

Jay Tenenbaum of Tel Aviv (IL)

Daniel Cohen-or of Tel Aviv (IL)

PROMPT-TO-PROMPT IMAGE EDITING WITH CROSS-ATTENTION CONTROL - A simplified explanation of the abstract

This abstract first appeared for US patent application 18228614 titled 'PROMPT-TO-PROMPT IMAGE EDITING WITH CROSS-ATTENTION CONTROL

Simplified Explanation

The patent application describes implementations for editing a source image generated from processing a source natural language prompt using a Large-scale language-image model. The source image can be edited based on user interface input indicating edits to the source NL prompt, and optionally without any user interface input specifying a mask in the source image or any other input. Additionally, the application discusses applying prompt-to-prompt editing techniques to edit a source image generated based on a real image, approximating the real image.

  • The patent application focuses on editing source images generated from processing natural language prompts using a Large-scale language-image model.
  • User interface input is used to indicate edits to the source NL prompt, allowing for easy modification of the generated source image.
  • The application also discusses the possibility of editing source images without any user interface input specifying a mask or any other input.
  • Prompt-to-prompt editing techniques are applied to edit source images generated based on real images, providing an approximation of the real image.

Potential Applications

  • Image editing software: The technology can be used in image editing software to allow users to easily modify source images generated from natural language prompts.
  • Content creation: Content creators can utilize this technology to quickly edit and modify images based on their desired changes to the source NL prompt.
  • Virtual reality: The patent application's techniques can be applied in virtual reality applications to enable users to edit and manipulate virtual images based on their input.

Problems Solved

  • Streamlined image editing: The technology solves the problem of complex and time-consuming image editing by allowing users to make edits to source images generated from natural language prompts.
  • Flexibility in editing: Users can easily modify and customize source images without the need for specifying masks or other input, making the editing process more intuitive and efficient.

Benefits

  • Enhanced user experience: The technology provides a user-friendly interface for editing images, making it accessible to a wider range of users.
  • Time-saving: By eliminating the need for specifying masks or additional input, the technology streamlines the image editing process, saving users time and effort.
  • Improved accuracy: The prompt-to-prompt editing techniques improve the accuracy of editing source images generated from real images, resulting in a closer approximation of the original image.


Original Abstract Submitted

Some implementations are directed to editing a source image, where the source image is one generated based on processing a source natural language (NL) prompt using a Large-scale language-image (LLI) model. Those implementations edit the source image based on user interface input that indicates an edit to the source NL prompt, and optionally independent of any user interface input that specifies a mask in the source image and/or independent of any other user interface input. Some implementations of the present disclosure are additionally or alternatively directed to applying prompt-to-prompt editing techniques to editing a source image that is one generated based on a real image, and that approximates the real image.