US Patent Application 18222946. IN-NETWORK COLLECTIVE OPERATIONS simplified abstract

From WikiPatents
Jump to navigation Jump to search

IN-NETWORK COLLECTIVE OPERATIONS

Organization Name

Intel Corporation


Inventor(s)

Vivek Kashyap of Portland OR (US)

Amedeo Sapio of San Jose CA (US)

IN-NETWORK COLLECTIVE OPERATIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18222946 titled 'IN-NETWORK COLLECTIVE OPERATIONS

Simplified Explanation

This patent application describes a switch that is used in training machine learning models.

  • The switch uses a reliable transport protocol to receive packet communications from worker nodes involved in the collective operation.
  • The switch stores the receipt state of each packet received from the worker nodes.
  • The switch then uses a non-reliable transport protocol to send the packets to a device responsible for aggregating the results.
  • The reliable transport protocol used by the switch is different from the non-reliable transport protocol.


Original Abstract Submitted

Examples described herein relate to a switch comprising circuitry configured to for packet communications associated with a collective operation to train machine learning (ML) models: utilize a reliable transport protocol for communications from at least one worker node of the collective operation to a switch, wherein the utilize a reliable transport protocol for communications from at least one worker node of the collective operation to the switch comprises store packet receipt state for per-packet communications from the at least one worker node of the collective operation to the switch and utilize a non-reliable transport protocol by the switch to a device that is to perform aggregation of results, wherein the reliable transport protocol comprises a different protocol than that of the non-reliable transport protocol.