US Patent Application 18347842. TIED AND REDUCED RNN-T simplified abstract

From WikiPatents
Jump to navigation Jump to search

TIED AND REDUCED RNN-T

Organization Name

Google LLC


Inventor(s)

Rami Botros of Mountain View CA (US)

Tara Sainath of Jersey City NJ (US)

TIED AND REDUCED RNN-T - A simplified explanation of the abstract

This abstract first appeared for US patent application 18347842 titled 'TIED AND REDUCED RNN-T

Simplified Explanation

The abstract describes a Recurrent Neural Network Transducer (RNN-T) model used for speech recognition. Here are the key points:

  • The RNN-T model includes a prediction network and a joint network.
  • The prediction network receives a sequence of non-blank symbols at each time step.
  • It generates an embedding for each non-blank symbol using a shared embedding matrix.
  • It assigns a position vector to each non-blank symbol and weights the embedding based on similarity to the position vector.
  • The prediction network generates a single embedding vector at each time step.
  • The joint network receives the embedding vector from the prediction network and generates a probability distribution over possible speech recognition hypotheses.
  • The RNN-T model is designed to improve speech recognition accuracy and efficiency.


Original Abstract Submitted

A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.