US Patent Application 18447891. DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS simplified abstract

From WikiPatents
Jump to navigation Jump to search

DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS

Organization Name

Huawei Technologies Co., Ltd.==Inventor(s)==

[[Category:Norbert Egi of Santa Clara CA (US)]]

[[Category:Meng Wang of Chicago IL (US)]]

DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18447891 titled 'DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS

Simplified Explanation

This patent application describes a method performed by a computing system with multiple compute nodes and a separate memory node. The method involves executing a task using the compute nodes and receiving snapshots at the memory node from the compute nodes. Each snapshot includes an instance of a task database.

  • The method sets a current checkpoint by storing a task database instance when all received snapshots match.
  • If unmatching snapshots are detected, the method rolls back the task database to a previous checkpoint.
  • The memory node distributes a correct checkpoint task database instance to at least one compute node.
  • This method ensures the consistency of the task database across multiple compute nodes in a computing system.


Original Abstract Submitted

A method performed by a computing system that includes multiple compute nodes and a memory node separate from the multiple compute nodes. The method comprises executing a task using the multiple compute nodes; recurrently receiving snapshots at the memory node from the multiple compute nodes, each snapshot including an instance of a task database; setting a current checkpoint by storing a task database instance corresponding to the current checkpoint when all received snapshots match; and rolling back the task database to a previous checkpoint when detecting unmatching snapshots received from the multiple compute nodes, including the memory node distributing a correct checkpoint task database instance to at least one compute node of the multiple compute nodes.