Skip Ring Topology in FAST Failure Detection Service

Kobusiński, Jacek; Gorski, Filip; Stempin, Stanisław

doi:10.1007/978-3-540-68111-3_4

Jacek Kobusiński¹,
Filip Gorski¹ &
Stanisław Stempin¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1184 Accesses
1 Citations

Abstract

This paper addresses the problem of communication among loosely coupled groups of nodes in distributed systems. We describe a novel proposal of logical communication topology based on skip list data structure. We enhance this structure to make it more resilient to failures. Its good self-stabilization characteristics are shown through extensive simulation experiments. We present this new concept in the context of our failure detection service, where we use it at a local communication level.

This work was supported by France Telecom, under project ”Brain” No. 21/06.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

FRLLE: a failure rate and load-based leader election algorithm for a bidirectional ring in distributed systems

Article 23 April 2020

Failure detection algorithm for Fail-Lagging model applied to HPC

Article 27 March 2022

Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link Faults

References

Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43, 225–267 (1996)
Article MATH MathSciNet Google Scholar
Hayashibara, N., Défago, X., Yared, R., Katayama, T.: The ϕ accrual failure detector. In: SRDS, pp. 66–78 (2004)
Google Scholar
Brzeziński, J., Kobusiński, J.: A survey of failure detector protocols. Foundations of Computing and Decision Sciences 28, 65–81 (2003)
Google Scholar
Reynal, M.: A short introduction to failure detectors for asynchronous distributed systems. SIGACT News, 53–70 (2005)
Google Scholar
Freiling, F., Guerraoui, R., Kouznetsov, P.: The failure detector abstraction. Technical Report TR 2006-003, Department for Mathematics and Computer Science, University of Mannheim (2006)
Google Scholar
van Renesse, R., Minsky, Y., Hayden, M.: A gossip-based failure detection service. In: Proc. of the Int. Conf. on Distributed Systems Platforms and Open Distributed Processing, pp. 55–70 (1998)
Google Scholar
Stelling, P., DeMatteis, C., Foster, I.T., Kesselman, C., Lee, C.A., von Laszewski, G.: A fault detection service for wide area distributed computations. Cluster Computing 2, 117–128 (1999)
Article Google Scholar
Gupta, I., Chandra, T.D., Goldszmidt, G.S.: On scalable and efficient distributed failure detectors. In: Proc. of 20^th Annual ACM Symp. on Principles of Distributed Computing, pp. 170–179. ACM Press, New York (2001)
Google Scholar
Bertier, M., Marin, O., Sens, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proc. of the Int. Conf. on Dependable Systems and Networks, Washington, DC, pp. 354–363 (2002)
Google Scholar
Hayashibara, N., Cherif, A., Katayama, T.: Failure detectors for large-scale distributed systems. In: Proc. of the 1^st Workshop on Self-Repairing and Self-Configurable Distributed Systems (RCDS), 21st IEEE Int’l Symp. on Reliable Distributed Systems (SRDS-21), Osaka, Japan, pp. 404–409 (2002)
Google Scholar
Dunagan, J., Harvey, N.J.A., Jones, M.B., Kosti, D., Theimer, M., Wolman, A.: FUSE: Lightweight guaranteed distributed failure notification. In: Proc. of the 6^th Symp. on Operating Systems Design and Implementation, pp. 151–166 (2004)
Google Scholar
Horita, Y., Taura, K., Chikayama, T.: A scalable and efficient self-organizing failure detector for grid applications. In: Proc. of 6^th Int. Workshop on Grid Computing, pp. 202–210 (2005)
Google Scholar
Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. IEEE Trans. on Parallel and Distributed Systems 10, 642–657 (1999)
Article Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35, 288–323 (1988)
Article MathSciNet Google Scholar
Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. Communication of the ACM 33, 668–676 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznań University of Technology, Poland
Jacek Kobusiński, Filip Gorski & Stanisław Stempin

Authors

Jacek Kobusiński
View author publications
You can also search for this author in PubMed Google Scholar
Filip Gorski
View author publications
You can also search for this author in PubMed Google Scholar
Stanisław Stempin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kobusiński, J., Gorski, F., Stempin, S. (2008). Skip Ring Topology in FAST Failure Detection Service. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-68111-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68105-2
Online ISBN: 978-3-540-68111-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Skip Ring Topology in FAST Failure Detection Service

Abstract

Access this chapter

Preview

Similar content being viewed by others

FRLLE: a failure rate and load-based leader election algorithm for a bidirectional ring in distributed systems

Failure detection algorithm for Fail-Lagging model applied to HPC

Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link Faults

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Skip Ring Topology in FAST Failure Detection Service

Abstract

Access this chapter

Preview

Similar content being viewed by others

FRLLE: a failure rate and load-based leader election algorithm for a bidirectional ring in distributed systems

Failure detection algorithm for Fail-Lagging model applied to HPC

Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link Faults

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation