20240038216. LANGUAGE IDENTIFICATION CLASSIFIER TRAINED USING ENCODED AUDIO FROM ENCODER OF PRE-TRAINED SPEECH-TO-TEXT SYSTEM simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

LANGUAGE IDENTIFICATION CLASSIFIER TRAINED USING ENCODED AUDIO FROM ENCODER OF PRE-TRAINED SPEECH-TO-TEXT SYSTEM

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Zvi Kons of Yoqneam Ilit (IL)

LANGUAGE IDENTIFICATION CLASSIFIER TRAINED USING ENCODED AUDIO FROM ENCODER OF PRE-TRAINED SPEECH-TO-TEXT SYSTEM - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240038216 titled 'LANGUAGE IDENTIFICATION CLASSIFIER TRAINED USING ENCODED AUDIO FROM ENCODER OF PRE-TRAINED SPEECH-TO-TEXT SYSTEM

Simplified Explanation

The abstract of the patent application describes a system that includes a processor capable of receiving encoded audio from a pre-trained speech-to-text (STT) model encoder. The processor is designed to further train a language identification (LID) classifier using labeled training samples to detect the language of the encoded audio.

  • The system includes a processor that receives encoded audio from an STT model encoder.
  • The processor is responsible for training a language identification (LID) classifier.
  • The LID classifier is trained using labeled training samples.
  • The purpose of the LID classifier is to detect the language of the encoded audio.

Potential Applications:

  • Speech recognition systems: The technology can be applied in speech recognition systems to accurately identify the language being spoken.
  • Multilingual transcription services: The system can be used in transcription services to automatically determine the language of the audio being transcribed.
  • Language-specific content filtering: It can be utilized in content filtering systems to identify and filter content based on the language it is in.

Problems Solved by this Technology:

  • Language detection accuracy: The system improves the accuracy of language detection in encoded audio, which can be challenging due to variations in accents, dialects, and speech patterns.
  • Efficient language identification: The technology enables efficient language identification without relying on external language models or resources.
  • Adaptability to new languages: The system can be trained with labeled samples to detect new languages, allowing it to adapt to a wide range of languages.

Benefits of this Technology:

  • Enhanced speech-to-text accuracy: By accurately identifying the language, the system can optimize the speech-to-text conversion process, resulting in improved accuracy.
  • Automation of language identification: The technology automates the language identification process, reducing the need for manual intervention.
  • Scalability and versatility: The system can be trained to detect multiple languages, making it scalable and versatile for various applications.


Original Abstract Submitted

an example system includes a processor to receive encoded audio from an encoder of a pre-trained speech-to-text (stt) model. the processor is to further train a language identification (lid) classifier to detect a language of the encoded audio using training samples labeled by language.