Evaluations of domino-free communication-induced checkpointing protocols
References (11)
- et al.
Consistent global checkpoints based on direct dependency tracking
Inform. Process. Lett.
(1994) - et al.
A distributed domino effect free recovery algorithm
- et al.
Distributed snapshots: determining global states of distributed systems
ACM Trans. Comput. Syst.
(1985) - et al.
A survey of rollback-recovery protocols in message-passing systems
- et al.
Communication-based prevention of useless checkpoints in distributed computations
Cited by (23)
An efficient validation approach for quasi-synchronous checkpointing oriented to distributed diagnosability
2016, Journal of Systems and SoftwareCitation Excerpt :Finding a method to construct a consistent snapshot in a ZCF system has been an open problem. The impossibility of designing an optimal ZCF quasi-synchronous checkpointing algorithm has been treated by Tsai et al. (1998). Recently, some algorithms which are ZCF have been developed, for example, the Fully Informed (FI) algorithm of Helary et al. (2000), the Fully Informed aNd Efficient (FINE) algorithm of Luo and Manivannan (2009), the Delayed Communication-Induced Checkpointing (DCFI) algorithm (Simon et al., 2013a) and the Scalable Fully Informed (SF-I) algorithm (Simon et al., 2013b) of Calixto et al..
Theoretical and experimental evaluation of communication-induced checkpointing protocols in F<inf>E</inf> and F<inf>Lazy-E</inf> families
2011, Performance EvaluationCitation Excerpt :A consistent global checkpoint (also called a recovery line) is where the whole system can rollback to, in case of a failure. “the HMNR1 protocol must force at least one checkpoint between any two consecutive forced checkpoints taken by protocol CP in a process” [25]. HMNR2 protocol must force at least one checkpoint between any two consecutive forced checkpoints taken by the FINE protocol in a process.
FINE: A Fully Informed aNd Efficient communication-induced checkpointing protocol for distributed systems
2009, Journal of Parallel and Distributed ComputingQuantifying rollback propagation in distributed checkpointing
2004, Journal of Parallel and Distributed ComputingInterval consistency of asynchronous distributed computations
2002, Journal of Computer and System SciencesImpossibility of scalar clock-based communication-induced checkpointing protocols ensuring the RDT property
2001, Information Processing Letters
- 1
Tsai and Kuo's work is supported by the National Science Council, Taiwan, ROC, under Grant NSC 87-2213-E-259-007.