METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING

Organization Name

Inventor(s)

METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING - A simplified explanation of the abstract

This abstract first appeared for US patent application 18450839 titled 'METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING

Simplified Explanation

The patent application describes a device and method for training a transformer model by optimizing the allocation of resources to different layers in the model.

Processors execute instructions to train a transformer model with multiple encoders and decoders.
Training data is divided into micro-batches and layer pairs are selected for each micro-batch.
A processing order of the layer pairs is determined to optimize resource allocation.
Resources are allocated to the layer pairs based on the determined resource information and processing order.

Potential Applications

This technology could be applied in various fields such as natural language processing, machine translation, and speech recognition where transformer models are commonly used.

Problems Solved

This technology addresses the challenge of efficiently training transformer models by optimizing the allocation of resources to different layers, leading to improved performance and faster training times.

Benefits

The optimized allocation of resources can result in faster training times, improved model accuracy, and more efficient use of computational resources.

Potential Commercial Applications

This technology could be valuable for companies developing AI applications that rely on transformer models, such as chatbots, language translation services, and voice assistants.

Possible Prior Art

Prior art may include research papers or patents related to optimizing resource allocation in neural networks or transformer models.

Unanswered Questions

How does this technology compare to existing methods for training transformer models?

This article does not provide a direct comparison to existing methods for training transformer models, so it is unclear how this technology differs in terms of performance, efficiency, or complexity.

What are the specific resource allocation strategies used in this technology?

The article mentions optimizing resource allocation, but it does not detail the specific strategies or algorithms used to allocate resources to different layers in the transformer model.

Original Abstract Submitted

A device including processors configured to execute instructions and memories storing the instructions, which when executed by the processors configure the processors to perform an operation for training a transformer model having a plurality of encoders and a plurality of decoders by configuring the processors to identify the batches of training data into a plurality of micro-batches, select layer pairs for the plurality of micro-batches, assemble a processing order of the layer pairs, determining resource information to be allocated to the layer pairs, and allocate resources to the layer pairs based on the determined resource information to be allocated to the layer pairs, dependent con the processing order of the layer pairs.

18450839. METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

Contents

METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING

Organization Name

Inventor(s)