SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION

Organization Name

Inventor(s)

Vijendra Raj Apsingekar of San Jose CA (US)

SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17444367 titled 'SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION

Simplified Explanation

The patent application describes a method for training a set of teacher models to transcribe audio samples and predict labeled datasets. The method involves training each teacher model to transcribe unlabeled audio samples and generate pseudo labeled datasets with multiple labels. Some of the audio samples contain named entity (NE) data, and the labels include transcribed NE labels corresponding to the NE data. The method also includes correcting the transcribed NE labels using user-specific NE textual data. The set of teacher models is then retrained based on the pseudo labeled dataset from the most accurate teacher model.

The method involves training teacher models to transcribe audio samples and predict labeled datasets.
Unlabeled audio samples are used to train the teacher models.
The teacher models generate pseudo labeled datasets with multiple labels.
Some of the audio samples contain named entity (NE) data.
The labels include transcribed NE labels corresponding to the NE data.
User-specific NE textual data is used to correct the transcribed NE labels.
The teacher models are retrained based on the pseudo labeled dataset from the most accurate teacher model.

Potential Applications

Speech recognition systems
Natural language processing applications
Transcription services
Language learning tools

Problems Solved

Lack of labeled audio data for training speech recognition models
Difficulty in transcribing named entities accurately in audio samples
Need for user-specific corrections in transcribed labels

Benefits

Improved accuracy of transcribing audio samples
Efficient training of speech recognition models
Customizable transcription based on user-specific data

Original Abstract Submitted

A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.

17444367. SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION simplified abstract (Samsung Electronics Co., Ltd.)

Contents

SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION

Organization Name

Inventor(s)