Jump to content

18961400. TRANSPARENT PRE-EMPTION AND MIGRATION FOR PLANET-SCALE COMPUTER (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents

TRANSPARENT PRE-EMPTION AND MIGRATION FOR PLANET-SCALE COMPUTER

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Muthian Sivathanu of Chennai IN

Srinidhi Viswanatha of Bangalore IN

Dharma Kiritkumar Shukla of Bellevue WA US

Nipun Kwatra of Bangalore IN

Ramachandran Ramjee of Bengaluru IN

Rimma Vladimirovna Nehme of Bellevue WA US

Pankaj Sharma of Redmond WA US

Bhalakumaaran Erode Ranganathan of Bellevue WA US

Vaibhav Sharma of Seattle WA US

TRANSPARENT PRE-EMPTION AND MIGRATION FOR PLANET-SCALE COMPUTER

This abstract first appeared for US patent application 18961400 titled 'TRANSPARENT PRE-EMPTION AND MIGRATION FOR PLANET-SCALE COMPUTER

Original Abstract Submitted

The disclosure herein describes platform-level checkpointing for deep learning (DL) jobs. The checkpointing is performed through capturing two kinds of state data: (i) GPU state (device state), and (ii) CPU state (host state). The GPU state includes GPU data (e.g., model parameters, optimizer state, etc.) that is located in the GPU and GPU context (e.g., the default stream in GPU, various handles created by the libraries such as DNN, Blas, etc.). Only a fraction of the GPU memory is copied because the checkpointing is done in a domain-aware manner. The “active” memory contains useful data like model parameters. To be able to capture the useful data, memory management is controlled to identify which parts of the memory are active. Also, to restore the destination GPU to the same context/state, a mechanism is used to capture such state-changing events on an original GPU and replayed on a destination GPU.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.