17526350. GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS

Organization Name

International Business Machines Corporation

Inventor(s)

Takashi Fukuda of Tokyo (JP)

Samuel Thomas of White Plains NY (US)

GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17526350 titled 'GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS

Simplified Explanation

The patent application describes a computer-based method for training a neural transducer for speech recognition. The method involves initializing the neural transducer with a prediction network, an encoder network, and a joint network.

  • The prediction network is expanded into multiple branches, each dedicated to a specific sub-task within speech recognition.
  • The neural transducer is trained using data sets for all the specific sub-tasks, allowing it to learn and improve its performance.
  • The trained neural transducer is obtained by combining the outputs of the multiple prediction-net branches.

Potential Applications

  • Speech recognition systems for various applications such as virtual assistants, transcription services, and voice-controlled devices.
  • Language translation services that can convert spoken words into written text in different languages.
  • Accessibility tools for individuals with speech impairments, enabling them to communicate more effectively.

Problems Solved

  • Enhances the accuracy and performance of speech recognition systems by training the neural transducer on multiple specific sub-tasks.
  • Addresses the challenge of handling different aspects of speech recognition, such as phoneme recognition, language modeling, and acoustic modeling, within a single system.
  • Provides a more efficient and effective method for training neural transducers for speech recognition.

Benefits

  • Improved accuracy and reliability of speech recognition systems, leading to better user experiences.
  • Increased flexibility and adaptability of the neural transducer by training it on various specific sub-tasks.
  • Simplified training process by fusing the outputs of multiple prediction-net branches, reducing the need for separate training for each sub-task.


Original Abstract Submitted

A computer-implemented method for training a neural transducer for speech recognition is provided. The method includes initializing the neural transducer having a prediction network and an encoder network and a joint network. The method further includes expanding the prediction network by changing the prediction network to a plurality of prediction-net branches. Each of the prediction-net branches is a prediction network for a respective specific sub-task from among a plurality of specific sub-tasks. The method also includes training, by a hardware processor, an entirety of the neural transducer by using training data sets for all of the plurality of specific sub-tasks. The method additionally includes obtaining a trained neural transducer by fusing the plurality of prediction-net branches.