18449732. IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM simplified abstract (CANON KABUSHIKI KAISHA)

From WikiPatents
Jump to navigation Jump to search

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM

Organization Name

CANON KABUSHIKI KAISHA

Inventor(s)

Takato Kimura of Kanagawa (JP)

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 18449732 titled 'IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM

Simplified Explanation

The patent application describes an image processing device that can divide input images into partial images, convert them into tokens with fixed-dimensional vectors, and analyze their association using an attention map to adjust feature vectors and reduce attention loss.

  • Acquire input image
  • Divide input image into partial images
  • Convert partial images into tokens with fixed-dimensional vectors
  • Analyze association between tokens using attention map
  • Adjust feature vectors to reduce attention loss

Potential Applications

  • Image recognition and classification
  • Object detection in images
  • Video processing and analysis

Problems Solved

  • Improving accuracy of image processing algorithms
  • Enhancing feature extraction from images
  • Reducing attention loss in image analysis

Benefits

  • Increased efficiency in image processing
  • Enhanced performance in image recognition tasks
  • Improved accuracy in object detection applications


Original Abstract Submitted

An image processing device including a processor or circuit configured to: acquire an input image; divide the input image into a plurality of partial images to obtain a partial image sequence; convert the partial image sequence into a token sequence by respectively converting each of the partial images includes in the partial image sequence into a token having a fixed-dimensional vector; obtain an encoded representation sequence based on an attention map indicating a degree of association between tokens and the token sequence; obtain a feature vector from the encoded representation sequence; and adjust at least parameters of the feature vector to reduce an attention loss value. The attention loss value corresponds to an error between a target value of the degree of association between the partial images and a value indicated by the attention map.