18404014. ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Organization Name

Google LLC

Inventor(s)

Noam M. Shazeer of Palo Alto CA (US)

Lukasz Mieczyslaw Kaiser of San Francisco CA (US)

Etienne Pot of Palo Alto CA (US)

Mohammad Saleh of Santa Clara CA (US)

Ben David Goodrich of San Francisco CA (US)

Peter J. Liu of Santa Clara CA (US)

Ryan Sepassi of Beverly Hills CA (US)

ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18404014 titled 'ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

The patent application describes methods, systems, and apparatus for generating an output sequence from an input sequence using a self-attention decoder neural network.

  • Input sequences are processed at each generation time step to generate a combined sequence that includes the input sequence and previously generated output tokens.
  • The combined sequence is then processed by the neural network to generate a score distribution over possible output tokens.
  • An output token is selected based on the score distribution to be the next token in the output sequence.

Potential Applications: - Natural language processing - Machine translation - Text generation tasks

Problems Solved: - Efficient generation of output sequences from input sequences - Improved performance in sequence-to-sequence tasks

Benefits: - Enhanced accuracy in generating output sequences - Increased efficiency in processing input sequences

Commercial Applications: - AI-powered chatbots - Language translation services - Content generation tools

Questions about the technology: 1. How does the self-attention decoder neural network improve sequence generation? 2. What are the key advantages of using this method in natural language processing tasks?


Original Abstract Submitted

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.