Microsoft technology licensing, llc (20240135919). FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER simplified abstract

From WikiPatents
Jump to navigation Jump to search

FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER

Organization Name

microsoft technology licensing, llc

Inventor(s)

Rui Zhao of Bellevue WA (US)

Jian Xue of Bellevue WA (US)

Sarangarajan Parthasarathy of Mountain View CA (US)

Jinyu Li of Bellevue WA (US)

FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240135919 titled 'FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER

Simplified Explanation

The abstract describes a patent application for a system and method for accessing a factorized neural transducer that includes separate predictors for predicting blank tokens and vocabulary tokens in automatic speech recognition.

  • The system includes a first set of layers for predicting blank tokens and a second set of layers for predicting vocabulary tokens.
  • The second set of layers includes a language model with a separate vocabulary predictor.
  • The vocabulary predictor output and encoder output are used to predict a vocabulary token.
  • Selective modifications are applied to the second set of layers to improve the accuracy of the neural transducer.

Potential Applications

This technology can be applied in automatic speech recognition systems, language translation tools, and voice-activated devices.

Problems Solved

This technology helps improve the accuracy and efficiency of automatic speech recognition systems by utilizing separate predictors for blank and vocabulary tokens.

Benefits

The system provides more accurate and reliable speech recognition results, leading to better user experience and increased productivity in various applications.

Potential Commercial Applications

This technology can be used in virtual assistants, transcription services, language learning platforms, and customer service automation tools.

Possible Prior Art

Prior art may include existing neural transducer systems for speech recognition, language modeling techniques, and neural network architectures for natural language processing.

Unanswered Questions

How does this technology compare to existing speech recognition systems in terms of accuracy and efficiency?

This article does not provide a direct comparison with existing speech recognition systems to evaluate the performance improvements offered by the factorized neural transducer.

What are the specific modifications applied to the second set of layers to enhance the accuracy of the neural transducer?

The article mentions selective modifications but does not detail the specific changes made to the second set of layers for improving the system's performance.


Original Abstract Submitted

systems and methods are provided for accessing a factorized neural transducer comprising a first set of layers for predicting blank tokens and a second set of layers for predicting vocabulary tokens, the second set of layers comprising a language model that includes a vocabulary predictor which is a separate predictor from the blank predictor, wherein a vocabulary predictor output from the vocabulary predictor and the encoder output are used for predicting a vocabulary token. the second set of layers is selectively modified to facilitate an improvement in an accuracy of the factorized neural transducer in performing automatic speech recognition, the selectively modifying comprising applying a particular modification to the second set of layers while refraining from applying the particular modification to the first set of layers.