18514772. AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY simplified abstract (Bank of America Corporation)
Contents
- 1 AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 How does the machine learning algorithm improve the accuracy of text classification?
- 1.11 What types of categories can the text be classified into?
- 1.12 Original Abstract Submitted
AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY
Organization Name
Inventor(s)
Aftab Khan of Richardson TX (US)
AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY - A simplified explanation of the abstract
This abstract first appeared for US patent application 18514772 titled 'AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY
Simplified Explanation
The patent application describes an apparatus that uses a machine learning algorithm to classify text extracted from images of pages. The processor identifies tokens within the text, removes invalid tokens using a dictionary, calculates a score based on the ratio of valid tokens, and classifies the text into a category if the score is above a threshold.
- Memory and processor in apparatus
- Dictionary and machine learning algorithm stored in memory
- Image of a page converted into text by processor
- Identification of tokens within the text
- Removal of invalid tokens using dictionary
- Calculation of score based on ratio of valid tokens
- Classification of text into a category if score is above threshold
- Storage of image and/or text in a database according to category
Potential Applications
This technology can be applied in document processing, automated data entry, and content categorization tasks.
Problems Solved
This technology solves the problem of efficiently classifying text extracted from images and storing it in a database based on its content.
Benefits
The benefits of this technology include improved accuracy in text classification, automated data processing, and enhanced organization of information.
Potential Commercial Applications
The technology can be used in document management systems, data entry software, and content management platforms to streamline processes and improve efficiency.
Possible Prior Art
One possible prior art for this technology could be optical character recognition (OCR) software that converts images of text into editable text files.
Unanswered Questions
How does the machine learning algorithm improve the accuracy of text classification?
The machine learning algorithm is trained to classify text based on patterns and features extracted from the data. By learning from a large dataset, the algorithm can identify subtle differences between categories and make more accurate classifications.
What types of categories can the text be classified into?
The text can be classified into various categories based on the training data provided to the machine learning algorithm. These categories can range from specific topics or themes to broader classifications such as sentiment analysis or language detection.
Original Abstract Submitted
An apparatus includes a memory and a processor. The memory stores a dictionary and a machine learning algorithm trained to classify text. The processor receives an image of a page, converts the image into a set of text, and identifies a plurality of tokens within the text. Each token includes one or more contiguous characters that are both preceded and followed by whitespace within the text. The processor identifies invalid tokens by removing tokens of the plurality of tokens that correspond to words of the dictionary. The processor calculates, based on a ratio of a total number of valid tokens to a total number of tokens, a score. In response to determining that the score is greater than a threshold, the processor applies the machine learning algorithm to classify the text into a category and stores the image and/or text in a database according to the category.