18096946. ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Organization Name

Google LLC

Inventor(s)

Noam M. Shazeer of Palo Alto CA (US)

Lukasz Mieczyslaw Kaiser of San Francisco CA (US)

Etienne Pot of Palo Alto CA (US)

Mohammad Saleh of Santa Clara CA (US)

Ben Goodrich of San Francisco CA (US)

Peter J. Liu of Santa Clara CA (US)

Ryan Sepassi of Beverly Hills CA (US)

ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18096946 titled 'ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Simplified Explanation

The patent application describes methods, systems, and apparatus for generating an output sequence from an input sequence using a self-attention decoder neural network. Here is a simplified explanation of the abstract:

  • The invention involves generating an output sequence based on an input sequence.
  • At each generation time step, a combined sequence is created by appending the already generated output tokens to the input sequence.
  • The combined sequence is then processed using a self-attention decoder neural network.
  • The neural network generates a time step output, which represents a score distribution over a set of possible output tokens.
  • An output token is selected from the set of possible output tokens based on the time step output, and it becomes the next token in the output sequence.

Potential applications of this technology:

  • Natural language processing: Generating coherent and context-aware sentences or paragraphs based on input text.
  • Machine translation: Generating accurate translations from one language to another.
  • Speech recognition: Generating accurate transcriptions of spoken language.
  • Chatbots and virtual assistants: Generating human-like responses to user queries or commands.

Problems solved by this technology:

  • Generating accurate and contextually relevant output sequences based on input sequences.
  • Handling variable-length input sequences and generating corresponding output sequences.
  • Improving the performance and efficiency of sequence generation tasks.

Benefits of this technology:

  • Improved accuracy and coherence in generating output sequences.
  • Ability to handle complex and diverse input sequences.
  • Faster and more efficient generation of output sequences.
  • Potential for applications in various fields, including natural language processing and machine translation.


Original Abstract Submitted

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.