17932639. TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION

Organization Name

International Business Machines Corporation

Inventor(s)

Zhong Fang Yuan of Xi'an (CN)

Tong Liu of Xi'an (CN)

Yi Chen Zhong of Shanghai (CN)

Xiang Yu Yang of Xi'an (CN)

Guan Chao Li of Shanghai (CN)

TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17932639 titled 'TRAINING AND USING A VECTOR ENCODER TO DETERMINE VECTORS FOR SUB-IMAGES OF TEXT IN AN IMAGE SUBJECT TO OPTICAL CHARACTER RECOGNITION

Simplified Explanation

The abstract describes a computer program product, system, and method for training and using a vector encoder to determine vectors for sub-images of text in an image for optical character recognition. The vector encoder encodes images representing text into vectors in a vector space, where similar text images have cohesive vectors and dissimilar text images have less cohesive vectors. Sub-images of an input image containing text are processed to generate sub-image vectors using the vector encoder. A search vector is created for search text, and optical character recognition is applied to regions of the input image with sub-image vectors matching the search vector.

  • Vector encoder trained to encode text images into vectors in a vector space
  • Sub-images of input image with text identified and processed to generate sub-image vectors
  • Search vector created for search text
  • Optical character recognition applied to regions of input image with matching sub-image vectors

Potential Applications

This technology can be applied in:

  • Document scanning and digitization
  • Image search engines
  • Text extraction from images

Problems Solved

This technology helps in:

  • Efficiently identifying and extracting text from images
  • Improving accuracy of optical character recognition
  • Enhancing search capabilities for text within images

Benefits

The benefits of this technology include:

  • Faster and more accurate text extraction from images
  • Improved search functionality for text within images
  • Enhanced document digitization processes

Potential Commercial Applications

This technology can be utilized in various commercial applications such as:

  • Document management systems
  • Image editing software
  • Automated data entry systems

Possible Prior Art

One possible prior art for this technology could be the use of neural networks for image recognition and text extraction.

What are the limitations of the technology described in the patent application?

The limitations of the technology described in the patent application include:

  • Dependency on the accuracy of the vector encoder for text image encoding
  • Sensitivity to variations in text fonts and styles

How does this technology compare to existing optical character recognition systems?

This technology differs from existing optical character recognition systems by utilizing a vector encoder to determine vectors for sub-images of text in an image, improving the accuracy and efficiency of text extraction from images.


Original Abstract Submitted

Provided are a computer program product, system, and method for training and using a vector encoder to determine vectors for sub-images of text in an image to subject to optical character recognition. A vector encoder is trained to encode images representing text into vectors in a vector space. Vectors of images representing similar text have a high degree of cohesion in the vector space. Vectors of images representing dissimilar text have a low degree of cohesion in the vector space. An input image is processed to determine sub-images of the input image that bound text represented in the input image. The sub-images are inputted to the vector encoder to output sub-image vectors. The vector encoder generates a search vector for search text. Optical character recognition is applied to at least one region of the input image including the sub-images having sub-image vectors matching the search vector.