17444367. SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION simplified abstract (Samsung Electronics Co., Ltd.)

From WikiPatents
Jump to navigation Jump to search

SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION

Organization Name

Samsung Electronics Co., Ltd.

Inventor(s)

Divya Neelagiri of Dublin CA (US)

Taeyeon Ki of Milpitas CA (US)

Vijendra Raj Apsingekar of San Jose CA (US)

SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17444367 titled 'SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION

Simplified Explanation

The patent application describes a method for training a set of teacher models to transcribe audio samples and predict labeled datasets. The method involves training each teacher model to transcribe unlabeled audio samples and generate pseudo labeled datasets with multiple labels. Some of the audio samples contain named entity (NE) data, and the labels include transcribed NE labels corresponding to the NE data. The method also includes correcting the transcribed NE labels using user-specific NE textual data. The set of teacher models is then retrained based on the pseudo labeled dataset from the most accurate teacher model.

  • The method involves training teacher models to transcribe audio samples and predict labeled datasets.
  • Unlabeled audio samples are used to train the teacher models.
  • The teacher models generate pseudo labeled datasets with multiple labels.
  • Some of the audio samples contain named entity (NE) data.
  • The labels include transcribed NE labels corresponding to the NE data.
  • User-specific NE textual data is used to correct the transcribed NE labels.
  • The teacher models are retrained based on the pseudo labeled dataset from the most accurate teacher model.

Potential Applications

  • Speech recognition systems
  • Natural language processing applications
  • Transcription services
  • Language learning tools

Problems Solved

  • Lack of labeled audio data for training speech recognition models
  • Difficulty in transcribing named entities accurately in audio samples
  • Need for user-specific corrections in transcribed labels

Benefits

  • Improved accuracy of transcribing audio samples
  • Efficient training of speech recognition models
  • Customizable transcription based on user-specific data


Original Abstract Submitted

A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.