Skip to main content

Skip Ring Topology in FAST Failure Detection Service

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Abstract

This paper addresses the problem of communication among loosely coupled groups of nodes in distributed systems. We describe a novel proposal of logical communication topology based on skip list data structure. We enhance this structure to make it more resilient to failures. Its good self-stabilization characteristics are shown through extensive simulation experiments. We present this new concept in the context of our failure detection service, where we use it at a local communication level.

This work was supported by France Telecom, under project ”Brain” No. 21/06.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43, 225–267 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  2. Hayashibara, N., Défago, X., Yared, R., Katayama, T.: The ϕ accrual failure detector. In: SRDS, pp. 66–78 (2004)

    Google Scholar 

  3. Brzeziński, J., Kobusiński, J.: A survey of failure detector protocols. Foundations of Computing and Decision Sciences 28, 65–81 (2003)

    Google Scholar 

  4. Reynal, M.: A short introduction to failure detectors for asynchronous distributed systems. SIGACT News, 53–70 (2005)

    Google Scholar 

  5. Freiling, F., Guerraoui, R., Kouznetsov, P.: The failure detector abstraction. Technical Report TR 2006-003, Department for Mathematics and Computer Science, University of Mannheim (2006)

    Google Scholar 

  6. van Renesse, R., Minsky, Y., Hayden, M.: A gossip-based failure detection service. In: Proc. of the Int. Conf. on Distributed Systems Platforms and Open Distributed Processing, pp. 55–70 (1998)

    Google Scholar 

  7. Stelling, P., DeMatteis, C., Foster, I.T., Kesselman, C., Lee, C.A., von Laszewski, G.: A fault detection service for wide area distributed computations. Cluster Computing 2, 117–128 (1999)

    Article  Google Scholar 

  8. Gupta, I., Chandra, T.D., Goldszmidt, G.S.: On scalable and efficient distributed failure detectors. In: Proc. of 20th Annual ACM Symp. on Principles of Distributed Computing, pp. 170–179. ACM Press, New York (2001)

    Google Scholar 

  9. Bertier, M., Marin, O., Sens, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proc. of the Int. Conf. on Dependable Systems and Networks, Washington, DC, pp. 354–363 (2002)

    Google Scholar 

  10. Hayashibara, N., Cherif, A., Katayama, T.: Failure detectors for large-scale distributed systems. In: Proc. of the 1st Workshop on Self-Repairing and Self-Configurable Distributed Systems (RCDS), 21st IEEE Int’l Symp. on Reliable Distributed Systems (SRDS-21), Osaka, Japan, pp. 404–409 (2002)

    Google Scholar 

  11. Dunagan, J., Harvey, N.J.A., Jones, M.B., Kosti, D., Theimer, M., Wolman, A.: FUSE: Lightweight guaranteed distributed failure notification. In: Proc. of the 6th Symp. on Operating Systems Design and Implementation, pp. 151–166 (2004)

    Google Scholar 

  12. Horita, Y., Taura, K., Chikayama, T.: A scalable and efficient self-organizing failure detector for grid applications. In: Proc. of 6th Int. Workshop on Grid Computing, pp. 202–210 (2005)

    Google Scholar 

  13. Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. IEEE Trans. on Parallel and Distributed Systems 10, 642–657 (1999)

    Article  Google Scholar 

  14. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35, 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  15. Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. Communication of the ACM 33, 668–676 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kobusiński, J., Gorski, F., Stempin, S. (2008). Skip Ring Topology in FAST Failure Detection Service. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68111-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68105-2

  • Online ISBN: 978-3-540-68111-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics