17983660. FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER simplified abstract (Microsoft Technology Licensing, LLC)

From WikiPatents
Revision as of 06:34, 8 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Rui Zhao of Bellevue WA (US)

Jian Xue of Bellevue WA (US)

Sarangarajan Parthasarathy of Mountain View CA (US)

Jinyu Li of Bellevue WA (US)

FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER - A simplified explanation of the abstract

This abstract first appeared for US patent application 17983660 titled 'FAST AND EFFICIENT TEXT ONLY ADAPTATION FOR FACTORIZED NEURAL TRANSDUCER

Simplified Explanation

The abstract describes a patent application for a system and method for accessing a factorized neural transducer that includes separate predictors for predicting blank tokens and vocabulary tokens in automatic speech recognition.

  • The system includes a first set of layers for predicting blank tokens and a second set of layers for predicting vocabulary tokens.
  • The second set of layers includes a language model with a vocabulary predictor that is distinct from the blank predictor.
  • The vocabulary predictor output and encoder output are used together to predict a vocabulary token.
  • The second set of layers can be selectively modified to improve the accuracy of the neural transducer, without modifying the first set of layers.

Potential Applications

This technology can be applied in various fields such as:

  • Automatic speech recognition systems
  • Natural language processing applications
  • Voice-controlled devices and virtual assistants

Problems Solved

This technology helps in addressing the following issues:

  • Improving accuracy in predicting vocabulary tokens
  • Enhancing performance of automatic speech recognition systems
  • Streamlining the process of language modeling

Benefits

The benefits of this technology include:

  • Increased accuracy in predicting vocabulary tokens
  • Enhanced efficiency in automatic speech recognition tasks
  • Improved user experience in voice-controlled applications

Potential Commercial Applications

This technology has potential commercial applications in:

  • Speech-to-text transcription services
  • Virtual assistants and chatbots
  • Language translation tools

Possible Prior Art

One possible prior art for this technology could be the use of separate predictors for different types of tokens in neural transducers, but the specific selective modification of layers for improving accuracy may be a novel aspect of this patent application.

Unanswered Questions

How does the selective modification of the second set of layers improve the accuracy of the neural transducer?

The abstract mentions that the second set of layers is selectively modified to enhance the accuracy of the system, but it does not provide details on the specific modifications or the underlying mechanism of improvement.

What are the potential limitations or challenges in implementing the proposed modifications to the neural transducer?

While the abstract highlights the benefits of the selective modifications, it does not discuss any potential drawbacks, limitations, or challenges that may arise during the implementation of these modifications.


Original Abstract Submitted

Systems and methods are provided for accessing a factorized neural transducer comprising a first set of layers for predicting blank tokens and a second set of layers for predicting vocabulary tokens, the second set of layers comprising a language model that includes a vocabulary predictor which is a separate predictor from the blank predictor, wherein a vocabulary predictor output from the vocabulary predictor and the encoder output are used for predicting a vocabulary token. The second set of layers is selectively modified to facilitate an improvement in an accuracy of the factorized neural transducer in performing automatic speech recognition, the selectively modifying comprising applying a particular modification to the second set of layers while refraining from applying the particular modification to the first set of layers.