20240020999. SMART OPTICAL CHARACTER RECOGNITION TRAINER simplified abstract (Innovative Computing & Applied Technology LLC)

From WikiPatents
Jump to navigation Jump to search

SMART OPTICAL CHARACTER RECOGNITION TRAINER

Organization Name

Innovative Computing & Applied Technology LLC

Inventor(s)

Radu Stoicescu of Braselton GA (US)

Jesse Osborne of Burtonsville MD (US)

SMART OPTICAL CHARACTER RECOGNITION TRAINER - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240020999 titled 'SMART OPTICAL CHARACTER RECOGNITION TRAINER

Simplified Explanation

The abstract of the patent application describes a smart optical character recognition (SOCR) trainer, which is a software designed to automate quality control using unsupervised machine-learning techniques. It analyzes, classifies, and optimizes textual data extracted from images or PDF documents. The SOCR trainer can be embedded into data processing workflows and performs automated tests to determine the trustworthiness of the extracted data. If deficiencies are detected, it performs optimizations, re-extracts text, and repeats quality assurance testing until the output meets desired specifications. Audit files are produced to record the differences between original and optimized document text.

  • The patent application describes a software tool called the SOCR trainer that automates quality control in optical character recognition.
  • The SOCR trainer uses unsupervised machine-learning techniques to analyze, classify, and optimize textual data extracted from images or PDF documents.
  • It can be embedded into various data processing workflows such as data pipelines, ETL processes, and data versioning repositories.
  • The SOCR trainer performs automated tests on the quality of images and extracted textual data to determine if the extraction is trustworthy.
  • If deficiencies are detected, the SOCR trainer analyzes document parameters, performs conditional optimizations, re-extracts text, and repeats quality assurance testing.
  • The SOCR trainer produces audit files that record the provenance and differences between original and optimized document text.

Potential applications of this technology:

  • Quality control in document digitization processes
  • Automation of data extraction from images or PDF documents
  • Integration into data processing workflows for improved efficiency and accuracy

Problems solved by this technology:

  • Ensures the trustworthiness of extracted textual data from images or PDF documents
  • Automates the optimization and quality control process, reducing manual effort and potential errors

Benefits of this technology:

  • Improved accuracy and reliability of optical character recognition
  • Increased efficiency in data processing workflows
  • Reduction in manual effort and potential errors in quality control processes


Original Abstract Submitted

a smart optical character recognition (socr) trainer comprises software developed for automating quality control (qc) using unsupervised machine-learning techniques to analyze, classify, and optimize textual data extracted from an image or pdf document. socr trainer serves as a ‘data treatment’ utility service that can be embedded into data processing workflows (e.g., data pipelines, etl processes, data versioning repositories, etc.). socr trainer performs a series of automated tests on the quality of images and their respective extracted textual data to determine if the extraction is trustworthy. if deficiencies are detected, socr trainer will analyze certain parameters of the document, perform conditional optimizations, re-perform text extraction, and repeat qa testing until the output meets desired specifications. socr trainer will produce audit files recording the provenance and differences between original documents and enhanced optimized document text.