18599667. TEXT EXTRACTION USING OPTICAL CHARACTER RECOGNITION simplified abstract (Capital One Services, LLC)
TEXT EXTRACTION USING OPTICAL CHARACTER RECOGNITION
Organization Name
Inventor(s)
Chris Demchalk of Frisco TX (US)
Ryan M. Parker of Dallas TX (US)
Lokesh Vijay Kumar of Frisco TX (US)
Brian Fromknecht of Richardson TX (US)
TEXT EXTRACTION USING OPTICAL CHARACTER RECOGNITION - A simplified explanation of the abstract
This abstract first appeared for US patent application 18599667 titled 'TEXT EXTRACTION USING OPTICAL CHARACTER RECOGNITION
- Simplified Explanation: This patent application describes systems and methods for extracting text from a document using different OCR tools to identify and select high-quality text, which is then saved after error correction.
- Key Features and Innovation:
- Use of multiple OCR tools to extract text from a document - Comparison of metrics to select high-quality text - Application of error correction to improve text quality - Threshold comparison to ensure minimal quality before saving the text
- Potential Applications:
- Document digitization - Data entry automation - Text extraction for analysis or translation purposes
- Problems Solved:
- Ensures high-quality text extraction from documents - Reduces manual effort in text extraction and error correction - Improves accuracy and efficiency in digitizing documents
- Benefits:
- Saves time and effort in extracting text from documents - Increases accuracy and reliability of extracted text - Enables automated processing of document content
- Commercial Applications:
"Text Extraction and Error Correction System for Document Digitization and Data Entry Automation"
- Questions about Text Extraction and Error Correction System:
1. How does the system compare different versions of extracted text to select the highest quality? 2. What types of errors specific to OCR tools or document contents can the error correction address?
Original Abstract Submitted
Provided herein are systems and methods for extracting text from a document. Different optical character recognition (OCR) tools are used to extract different versions of the text in the document. Metrics evaluating the quality of the extracted text are compared to identify and select higher quality extracted text. A selected portion of text is compared to a threshold to ensure minimal quality. The selected portion of text is then saved. Error correction can be applied to the selected portion of text based on errors specific to the OCR tools or the document contents.