US Patent Application 18144045. OPEN-VOCABULARY OBJECT DETECTION IN IMAGES simplified abstract

From WikiPatents
Jump to navigation Jump to search

OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Organization Name

Google LLC


Inventor(s)

Matthias Johannes Lorenz Minderer of Zurich (CH)

Alexey Alexeevich Gritsenko of Amsterdam (NL)

Austin Charles Stone of San Francisco CA (US)

Dirk Weissenborn of Berlin (DE)

Alexey Dosovitskiy of Berlin (DE)

Neil Matthew Tinmouth Houlsby of Zürich (CH)

OPEN-VOCABULARY OBJECT DETECTION IN IMAGES - A simplified explanation of the abstract

This abstract first appeared for US patent application 18144045 titled 'OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Simplified Explanation

The patent application describes a method for object detection using a neural network.

  • The method involves obtaining an image and a set of query embeddings representing different categories of objects.
  • The image and query embeddings are processed using an object detection neural network.
  • The image is processed using an image encoding subnetwork to generate object embeddings.
  • Each object embedding is processed using a localization subnetwork to determine the region of the image where the object is located.
  • The object embeddings and query embeddings are processed using a classification subnetwork to generate a classification score distribution for each object embedding.
  • This method allows for accurate detection and classification of objects in an image.


Original Abstract Submitted

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.