17891635. Leveraging Machine Learning Models to Identify Missing or Incorrect Labels in Training or Testing Data simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Leveraging Machine Learning Models to Identify Missing or Incorrect Labels in Training or Testing Data

Organization Name

GOOGLE LLC

Inventor(s)

James Bradley Wendt of San Francisco CA (US)

Sandeep Tata of San Francisco CA (US)

Lauro Ivo Beltrao Colaco Costa of Kirkland WA (US)

Emmanouil Koukoumidis of Kirkland WA (US)

Leveraging Machine Learning Models to Identify Missing or Incorrect Labels in Training or Testing Data - A simplified explanation of the abstract

This abstract first appeared for US patent application 17891635 titled 'Leveraging Machine Learning Models to Identify Missing or Incorrect Labels in Training or Testing Data

Simplified Explanation

    • Explanation:**

- Labels are often over labeled by machine-learning models and under labeled by human labelers. - Solution involves both machine-learning model and human labeling a document, then sending it to a parser to determine discrepancies. - Discrepancies are presented to a human for review and decision on identified labels. - Feedback is given to machine-learning model for improvement in confidence calculations. - Confidence threshold determines if identified labels are presented.

    • Potential Applications:**

- Document classification - Sentiment analysis - Image recognition

    • Problems Solved:**

- Over labeling by machine-learning models - Under labeling by human labelers - Inconsistencies in labeling accuracy

    • Benefits:**

- Improved accuracy in labeling - Efficient use of machine-learning models and human input - Enhanced performance of document parsing and classification systems


Original Abstract Submitted

Labels are often over labeled by machine-learning models and under labeled by human labelers. A solution to the over and under labeling problem is to have both a machine-learning model and a human label a document, then send the document to a parser to determine the discrepancies. The discrepancies are then presented to a human to review and decide whether the machine-learning model identified labels are labels. The feedback is then given to the machine-learning model for further improvement in its confidence calculations which via a confidence threshold determine if the identified labels are presented.