International business machines corporation (20240104418). GRAPHICS PROCESSING UNIT TRAINING JOB ALLOCATION simplified abstract

From WikiPatents
Jump to navigation Jump to search

GRAPHICS PROCESSING UNIT TRAINING JOB ALLOCATION

Organization Name

international business machines corporation

Inventor(s)

Lin Dong of Beijing (CN)

Jun Feng Liu of Ontario (CA)

GRAPHICS PROCESSING UNIT TRAINING JOB ALLOCATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240104418 titled 'GRAPHICS PROCESSING UNIT TRAINING JOB ALLOCATION

Simplified Explanation

The abstract describes a computer-implemented method for training a machine learning model using multiple GPU resources efficiently by determining available memory, loading training jobs into GPUs, and optimizing packing patterns based on a cost model.

  • Efficient training of machine learning models using multiple GPU resources
  • Determining available memory in each GPU resource
  • Loading training jobs into GPUs based on a cost model
  • Optimizing packing patterns for efficient utilization of GPU resources

Potential Applications

This technology can be applied in various fields such as image recognition, natural language processing, and autonomous driving where training large machine learning models efficiently is crucial.

Problems Solved

This technology addresses the challenge of efficiently utilizing multiple GPU resources for training machine learning models, optimizing memory usage, and improving overall training performance.

Benefits

The benefits of this technology include faster training times, improved model accuracy, reduced costs associated with training, and increased efficiency in utilizing GPU resources.

Potential Commercial Applications

Potential commercial applications of this technology include cloud computing services, AI development platforms, and industries requiring large-scale machine learning model training such as healthcare and finance.

Possible Prior Art

One possible prior art could be the use of parallel processing techniques in training machine learning models to improve efficiency and reduce training times. Another could be the optimization of memory usage in GPU resources for machine learning tasks.

Unanswered Questions

How does this technology compare to existing methods in terms of training speed and model accuracy?

The article does not provide a direct comparison with existing methods in terms of training speed and model accuracy.

What are the limitations or constraints of this technology in real-world applications?

The article does not discuss the potential limitations or constraints of implementing this technology in real-world applications.


Original Abstract Submitted

a computer-implemented method for training a machine learning model includes receiving a first training job at a processing device of a computer system having a plurality of graphics processing unit (gpu) resources, the first training job being part of a set of training jobs, and determining an amount of available memory in each gpu resource of the plurality of gpu resources. the method also includes loading the training job into one or more gpu resources with at least one second training job. the loading includes determining a cost model indicating an efficiency cost of each of a plurality of packing patterns, and packing the first training job and the second training job into the one or more gpu resources according to a packing pattern associated with a lowest efficiency cost.