US Patent Application 18447891. DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS simplified abstract
Contents
DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS
Organization Name
Huawei Technologies Co., Ltd.==Inventor(s)==
[[Category:Norbert Egi of Santa Clara CA (US)]]
[[Category:Meng Wang of Chicago IL (US)]]
DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS - A simplified explanation of the abstract
This abstract first appeared for US patent application 18447891 titled 'DISTRIBUTED FAULT-TOLERANCE VIA DISAGGREGATED MEMORY BOARDS
Simplified Explanation
This patent application describes a method performed by a computing system with multiple compute nodes and a separate memory node. The method involves executing a task using the compute nodes and receiving snapshots at the memory node from the compute nodes. Each snapshot includes an instance of a task database.
- The method sets a current checkpoint by storing a task database instance when all received snapshots match.
- If unmatching snapshots are detected, the method rolls back the task database to a previous checkpoint.
- The memory node distributes a correct checkpoint task database instance to at least one compute node.
- This method ensures the consistency of the task database across multiple compute nodes in a computing system.
Original Abstract Submitted
A method performed by a computing system that includes multiple compute nodes and a memory node separate from the multiple compute nodes. The method comprises executing a task using the multiple compute nodes; recurrently receiving snapshots at the memory node from the multiple compute nodes, each snapshot including an instance of a task database; setting a current checkpoint by storing a task database instance corresponding to the current checkpoint when all received snapshots match; and rolling back the task database to a previous checkpoint when detecting unmatching snapshots received from the multiple compute nodes, including the memory node distributing a correct checkpoint task database instance to at least one compute node of the multiple compute nodes.