International business machines corporation (20240096121). TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION simplified abstract

From WikiPatents
Jump to navigation Jump to search

TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION

Organization Name

international business machines corporation

Inventor(s)

Zhong Fang Yuan of Xi'an (CN)

Tong Liu of Xi'an (CN)

Yi Chen Zhong of Shanghai (CN)

Xiang Yu Yang of Xi'an (CN)

Guan Chao Li of Shanghai (CN)

TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240096121 titled 'TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION

Simplified Explanation

The abstract describes a computer program product, system, and method for training and using a vector encoder to determine vectors for sub-images of text in an image to subject to optical character recognition. The vector encoder is trained to encode images representing text into vectors in a vector space, where vectors of images representing similar text have a high degree of cohesion in the vector space, and vectors of images representing dissimilar text have a low degree of cohesion in the vector space. The input image is processed to determine sub-images that bound text, which are then inputted to the vector encoder to output sub-image vectors. A search vector is generated for search text, and optical character recognition is applied to at least one region of the input image including the sub-images with sub-image vectors matching the search vector.

  • Trained vector encoder encodes images of text into vectors in a vector space.
  • Vectors of images representing similar text have high cohesion in the vector space.
  • Vectors of images representing dissimilar text have low cohesion in the vector space.
  • Input image is processed to determine sub-images containing text.
  • Sub-images are inputted to the vector encoder to output sub-image vectors.
  • Search vector is generated for search text.
  • Optical character recognition is applied to regions of the input image with matching sub-image vectors.

Potential Applications

This technology can be applied in document scanning, text recognition in images, and automated data extraction from images.

Problems Solved

This technology solves the problem of accurately identifying and extracting text from images, especially in scenarios where traditional OCR methods may struggle due to variations in text appearance.

Benefits

The benefits of this technology include improved accuracy in text recognition, faster data extraction from images, and enhanced automation in document processing tasks.

Potential Commercial Applications

Potential commercial applications of this technology include document management systems, automated data entry software, and image-based text search engines.

Possible Prior Art

One possible prior art is the use of convolutional neural networks for image recognition and text extraction, but this technology specifically focuses on training a vector encoder for text images and applying OCR based on vector similarity.

Unanswered Questions

How does this technology compare to existing OCR methods in terms of accuracy and efficiency?

This article does not provide a direct comparison between this technology and existing OCR methods.

What are the potential limitations or challenges in implementing this technology on a large scale?

The article does not address the potential limitations or challenges in implementing this technology on a large scale.


Original Abstract Submitted

provided are a computer program product, system, and method for training and using a vector encoder to determine vectors for sub-images of text in an image to subject to optical character recognition. a vector encoder is trained to encode images representing text into vectors in a vector space. vectors of images representing similar text have a high degree of cohesion in the vector space. vectors of images representing dissimilar text have a low degree of cohesion in the vector space. an input image is processed to determine sub-images of the input image that bound text represented in the input image. the sub-images are inputted to the vector encoder to output sub-image vectors. the vector encoder generates a search vector for search text. optical character recognition is applied to at least one region of the input image including the sub-images having sub-image vectors matching the search vector.