Nvidia corporation (20240135630). IMAGE SYNTHESIS USING DIFFUSION MODELS CREATED FROM SINGLE OR MULTIPLE VIEW IMAGES simplified abstract

From WikiPatents
Revision as of 02:21, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

IMAGE SYNTHESIS USING DIFFUSION MODELS CREATED FROM SINGLE OR MULTIPLE VIEW IMAGES

Organization Name

nvidia corporation

Inventor(s)

Koki Nagano of Playa Vista CA (US)

Eric Ryan Wong Chan of Alameda CA (US)

Tero Tapani Karras of Helsinki (FI)

Shalini De Mello of San Francisco CA (US)

Miika Samuli Aittala of Helsinki (FI)

Matthew Aaron Wong Chan of Los Altos CA (US)

IMAGE SYNTHESIS USING DIFFUSION MODELS CREATED FROM SINGLE OR MULTIPLE VIEW IMAGES - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240135630 titled 'IMAGE SYNTHESIS USING DIFFUSION MODELS CREATED FROM SINGLE OR MULTIPLE VIEW IMAGES

Simplified Explanation

The abstract describes a method and system for image synthesis using generative networks, where a 3D representation of an input image is inferred, a feature image is generated using volume rendering techniques, and a denoiser network processes the feature image and a noisy image to predict an output image from a novel viewpoint.

  • Encoder-based model trained to infer 3D representation of input image
  • Feature image generated using volume rendering techniques
  • Denoiser network processes feature image and noisy image to predict output image
  • Modified noise conditional score network (NCSN) used as denoiser network
  • Multiple input images can be provided, with a different 3D representation generated for each
  • Aggregate feature image generated by sampling 3D representations and applying mean-pooling operation

Potential Applications

This technology can be applied in various fields such as computer graphics, virtual reality, augmented reality, and image editing software.

Problems Solved

This technology solves the problem of generating realistic images from novel viewpoints based on input images, which can be useful in creating immersive visual experiences and enhancing image editing capabilities.

Benefits

The benefits of this technology include improved image synthesis quality, the ability to generate images from different viewpoints, and enhanced visual effects in computer-generated imagery.

Potential Commercial Applications

Potential commercial applications of this technology include video game development, movie special effects, virtual try-on applications in e-commerce, and medical imaging software.

Possible Prior Art

One possible prior art in this field is the use of generative adversarial networks (GANs) for image synthesis and manipulation, which have been widely studied in recent years for various applications in computer vision and graphics.

Unanswered Questions

How does this technology compare to existing image synthesis methods using GANs?

This article does not directly compare this technology to existing image synthesis methods using GANs.

What are the limitations of the proposed method in terms of scalability and computational resources?

This article does not address the limitations of the proposed method in terms of scalability and computational resources.


Original Abstract Submitted

a method and system for performing novel image synthesis using generative networks are provided. the encoder-based model is trained to infer a 3d representation of an input image. a feature image is then generated using volume rendering techniques in accordance with the 3d representation. the feature image is then concatenated with a noisy image and processed by a denoiser network to predict an output image from a novel viewpoint that is consistent with the input image. the denoiser network can be a modified noise conditional score network (ncsn). in some embodiments, multiple input images or keyframes can be provided as input, and a different 3d representation is generated for each input image. the feature image is then generated, during volume rendering, by sampling each of the 3d representations and applying a mean-pooling operation to generate an aggregate feature image.