Samsung Electronics Co., Ltd. (20240304179). EFFICIENT ADAPTATION OF SPOKEN LANGUAGE UNDERSTANDING BASED ON AUTOMATIC SPEECH RECOGNITION USING MULTI-TASK LEARNING simplified abstract

From WikiPatents
Jump to navigation Jump to search

EFFICIENT ADAPTATION OF SPOKEN LANGUAGE UNDERSTANDING BASED ON AUTOMATIC SPEECH RECOGNITION USING MULTI-TASK LEARNING

Organization Name

Samsung Electronics Co., Ltd.

Inventor(s)

Euisung Kim of Mountain View CA (US)

Aditya Jajodia of Sunnyvale CA (US)

Cindy Sushen Tseng of Santa Clara CA (US)

Divya Neelagiri of Dublin CA (US)

Taeyeon Ki of Milpitas CA (US)

Vijendra Raj Apsingekar of San Jose CA (US)

EFFICIENT ADAPTATION OF SPOKEN LANGUAGE UNDERSTANDING BASED ON AUTOMATIC SPEECH RECOGNITION USING MULTI-TASK LEARNING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240304179 titled 'EFFICIENT ADAPTATION OF SPOKEN LANGUAGE UNDERSTANDING BASED ON AUTOMATIC SPEECH RECOGNITION USING MULTI-TASK LEARNING

Simplified Explanation

The patent application describes a method that uses automatic speech recognition to understand spoken language. It involves generating acoustic representations of input utterances, determining text representations of tokens, combining text and acoustic representations to generate joint representations, and determining semantic labels for tokens.

  • The method uses automatic speech recognition to understand spoken language.
  • It generates acoustic representations of input utterances.
  • It determines text representations of tokens.
  • It combines text and acoustic representations to generate joint representations.
  • It determines semantic labels for tokens.

Key Features and Innovation

  • Utilizes automatic speech recognition for spoken language understanding.
  • Generates acoustic representations of input utterances.
  • Combines text and acoustic representations for a more accurate understanding.
  • Determines semantic labels for tokens based on joint representations.
  • Improves the accuracy and efficiency of spoken language understanding.

Potential Applications

This technology can be applied in:

  • Virtual assistants
  • Transcription services
  • Language translation tools
  • Voice-controlled devices
  • Customer service automation

Problems Solved

  • Enhances the accuracy of speech recognition.
  • Improves the understanding of spoken language.
  • Streamlines communication processes.
  • Enables more efficient interaction with technology.
  • Facilitates the development of advanced language processing systems.

Benefits

  • Enhanced accuracy in speech recognition.
  • Improved efficiency in understanding spoken language.
  • Streamlined communication processes.
  • Better interaction with technology.
  • Facilitation of advanced language processing systems.

Commercial Applications

Title: Automatic Speech Recognition for Enhanced Language Understanding This technology can be commercially used in:

  • Developing virtual assistants for various industries.
  • Enhancing transcription services for businesses.
  • Improving language translation tools for global communication.
  • Implementing voice-controlled devices for smart homes and offices.
  • Automating customer service processes for better user experience.

Questions about Automatic Speech Recognition

How does automatic speech recognition improve spoken language understanding?

Automatic speech recognition uses acoustic representations to generate text representations, which are then combined to determine semantic labels for tokens, leading to a more accurate understanding of spoken language.

What are the key components of the automatic speech recognition-based spoken language understanding model?

The key components include a shared ASR encoder, an ASR decoder, a fusion model, and an SLU decoder, which work together to process input utterances and generate semantic labels for tokens.


Original Abstract Submitted

a method includes receiving, by an automatic speech recognition (asr)-based spoken language understanding (slu) model, an input utterance using an audio input device. the method also includes, for each token of the input utterance, generating, using a shared asr encoder of the asr-based slu model, an acoustic representation of acoustic features of the token (the shared asr encoder including a first adapter layer); determining, using an asr decoder of the asr-based slu model, a text representation of the token using the acoustic representation and any previous tokens (the asr decoder including a second adapter layer); combining, using a fusion model of the asr-based slu model, the text representation and the acoustic representation to generate a joint representation, and determining, using an slu decoder of the asr-based slu model, a semantic label associated with the token based on the joint representation and any previous semantic labels.