20240046516. ESTIMATING 3D SCENE REPRESENTATIONS OF IMAGES simplified abstract (Unknown Organization)

From WikiPatents
Jump to navigation Jump to search

ESTIMATING 3D SCENE REPRESENTATIONS OF IMAGES

Organization Name

Unknown Organization

Inventor(s)

Titas Anciukevicius of Edinburgh (GB)

ESTIMATING 3D SCENE REPRESENTATIONS OF IMAGES - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240046516 titled 'ESTIMATING 3D SCENE REPRESENTATIONS OF IMAGES

Simplified Explanation

Methods and systems are disclosed for estimating a 3D scene representation from one or multiple 2D images. The operations include receiving 2D images representing a real-world environment and generating a 3D scene representation using a machine learning model. The model explicitly defines the 3D shape and appearance of the background as well as the 3D position, shape, and appearance of each object in the scene. The machine learning model is trained in an unsupervised approach without any manually labeled annotations.

  • The patent application describes a method for creating a 3D representation of a scene using 2D images.
  • The method involves receiving one or multiple 2D images that depict a real-world environment.
  • A machine learning model is used to generate a 3D scene representation of the images.
  • The 3D representation includes the shape and appearance of the background, as well as the position, shape, and appearance of each object in the scene.
  • The machine learning model is trained in an unsupervised approach using a dataset of images and their camera poses, without any manually labeled annotations such as depth maps or segmentation masks.

Potential applications of this technology:

  • Augmented reality: The 3D scene representation can be used to overlay virtual objects onto real-world environments, enhancing the user's perception and interaction with the surroundings.
  • Virtual reality: The technology can be used to create immersive virtual environments by accurately representing the 3D scene from 2D images.
  • Robotics: The 3D scene representation can aid robots in understanding and navigating real-world environments, enabling them to interact with objects and perform tasks more effectively.

Problems solved by this technology:

  • Accurate 3D scene representation: The technology addresses the challenge of estimating a detailed and accurate 3D representation of a scene from 2D images, including the shape, appearance, and position of objects.
  • Unsupervised learning: By training the machine learning model in an unsupervised approach, the technology eliminates the need for manually labeled annotations, reducing the time and effort required for training.

Benefits of this technology:

  • Improved user experience: The accurate 3D scene representation enhances the realism and immersion in augmented and virtual reality applications, providing a more engaging user experience.
  • Automation and efficiency: The technology enables robots and autonomous systems to understand and interact with real-world environments more effectively, leading to improved automation and efficiency in various industries.
  • Cost-effective training: The unsupervised learning approach eliminates the need for manual annotation of training data, reducing the cost and time required for training the machine learning model.


Original Abstract Submitted

methods and systems are disclosed for performing operations for estimating a 3d scene representation from one or multiple 2d images. the operations include: receiving one or multiple two-dimensional (2d) images representing a real-world environment; and generating, by a machine learning model, a three-dimensional (3d) scene representation of the 2d image, which explicitly (separately) defines the a 3d shape and appearance of the background as well as a 3d position, 3d shape and appearance of each object of the scene depicted in the set of images, where the machine learning model has been trained in an unsupervised approach from a dataset of images and their camera poses (e.g. without any manually labelled annotations, such as depth maps, segmentation masks, object poses).