SYSTEM FOR PROVIDING ENHANCED VISION TRANSFORMER BLOCKS FOR COMPUTER VISION

Organization Name

Inventor(s)

Shakti Nagnath Wadekar of West Lafayette IN (US)

SYSTEM FOR PROVIDING ENHANCED VISION TRANSFORMER BLOCKS FOR COMPUTER VISION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18359786 titled 'SYSTEM FOR PROVIDING ENHANCED VISION TRANSFORMER BLOCKS FOR COMPUTER VISION

Simplified Explanation

The patent application describes a system for enhancing mobile vision transformers to perform computer vision tasks such as image classification, segmentation, and object detection. The system includes a local representation block, a global representation block, and a fusion block.

The local representation block applies a depthwise-separable convolutional layer to vectors of an input image. This helps in creating local representation outputs associated with the image.
The global representation block unfolds the local representation outputs, applies vision transformers, and folds the result to generate a global representation output associated with the image.
The fusion block concatenates the local representations with the global representations, applies a point-wise convolution to the concatenation to generate a fusion block output, and fuses input features of the image with the fusion block output to generate an output for computer vision tasks.

Potential applications of this technology:

Image classification: The enhanced vision transformer block can be used to classify images into different categories.
Segmentation: It can be used to segment images into different regions or objects.
Object detection: The system can help in detecting and localizing objects within an image.

Problems solved by this technology:

Improved performance: The enhanced vision transformer block improves the performance of mobile vision transformers in performing computer vision tasks.
Local and global representation: The system combines local and global representations of an image to provide a more comprehensive understanding of the image.

Benefits of this technology:

Enhanced accuracy: The system improves the accuracy of computer vision tasks by incorporating both local and global representations.
Efficient computation: The depthwise-separable convolutional layer and folding/unfolding operations optimize the computation process for mobile vision transformers.
Versatility: The system can be applied to various computer vision tasks, making it a versatile solution for different applications.

Original Abstract Submitted

A system for providing an enhanced vision transformer block for mobile vision transformers to perform computer vision tasks, such as image classification, segmentation, and objected detection is disclosed. A local representation block of the block applies a depthwise-separable convolutional layer to vectors of an input image to facilitate creation of local representation outputs associated with the image. The local representation output is fed into a global representation block, which unfolds the local representation outputs, applies vision transformers, and folds the result to generate a global representation output associated with the image. The global representation output is fed to a fusion block, which concatenates the local representations with the global representations, applies a point-wise convolution to the concatenation to generate a fusion block output, and fuses input features of the image with the fusion block out to generate an output to facilitate performance of a computer vision tasks.

18359786. SYSTEM FOR PROVIDING ENHANCED VISION TRANSFORMER BLOCKS FOR COMPUTER VISION simplified abstract (Micron Technology, Inc.)

Contents

SYSTEM FOR PROVIDING ENHANCED VISION TRANSFORMER BLOCKS FOR COMPUTER VISION

Organization Name

Inventor(s)