skip to main content
research-article

Local Fast Failover Routing With Low Stretch

Published:27 April 2018Publication History
Skip Abstract Section

Abstract

Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information.

While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput.

This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.

References

  1. 1 M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. ACM SIGCOMM CCR, 38(4):63–74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 A. Bhalgat, R. Hariharan, T. Kavitha, and D. Panigrahi. Fast edge splitting and edmonds' arborescence construction for unweighted graphs. In Proc. SODA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 M. Borokhovich, L. Schiff, and S. Schmid. Provable data plane connectivity with local fast failover: Introducing openflow graph algorithms. In Proc. ACM SIGCOMM HotSDN, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 M. Borokhovich and S. Schmid. How (not) to shoot in your foot with sdn local fast failover: A load-connectivity tradeoff. In Proc. OPODIS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 M. Chiesa, A. Gurtov, A. Madry, S. Mitrovic, I. Nikolaevkiy, A. Panda, M. Schapira, and S. Shenker. Exploring the limits of static failover routing (v4). arXiv:1409.0034 {cs.NI}, 2016.Google ScholarGoogle Scholar
  6. 6 M. Chiesa, I. Nikolaevskiy, S. Mitrovic, A. V. Gurtov, A. Madry, M. Schapira, and S. Shenker. On the resiliency of static forwarding tables. IEEE/ACM Trans. Netw., 25(2):1133–1146, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 J. Edmonds. Edge-disjoint branchings. Combinatorial algorithms, 9(91-96):2, 1973.Google ScholarGoogle Scholar
  8. 8 E. Gafni and D. Bertsekas. Distributed algorithms for generating loop-free routes in networks with frequently changing topology. Trans. Commun., 29(1):11–18, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9 P. Gill, N. Jain, and N. Nagappan. Understanding network failures in data centers: measurement, analysis, and implications. SIGCOMM CCR, 41:350–361, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. Bcube: a high performance, server-centric network architecture for modular data centers. In Proc. ACM SIGCOMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 P. Hall. On representatives of subsets. Journal of the London Mathematical Society, s1-10(1):26–30, 1935.Google ScholarGoogle Scholar
  12. 12 M. Kaufmann and K. Mehlhorn. A linear-time algorithm for the homotopic routing problem in grid graphs. SIAM J. on Computing, 23(2):227–246, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 V. Liu, D. Halperin, A. Krishnamurthy, and T. E. Anderson. F10: A fault-tolerant engineered network. In Proc. USENIX NSDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and C. Diot. Characterization of failures in an ip backbone. In Proc. IEEE INFOCOM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. 15 Y.-A. Pignolet, S. Schmid, and G. Tredan. Load-optimal local fast rerouting for dependable networks. In Proc. DSN, 2017.Google ScholarGoogle Scholar
  16. 16 S. Schmid and J. Srba. Polynomial-time what-if analysis for prefix-manipulating mpls networks. In Proc. IEE INFOCOM, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  17. 17 A. Shaikh, C. Isett, A. Greenberg, M. Roughan, and J. Gottlieb. A case study of ospf behavior in a large enterprise network. In Proc. IMW. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 B. Stephens, A. L. Cox, and S. Rixner. Plinko: Building provably resilient forwarding tables. In Proc. ACM HotNets, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 B. Stephens, A. L. Cox, and S. Rixner. Scalable multi-failure fast failover via forwarding table compression. SOSR. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 R. Stong. Hamilton decompositions of cartesian products of graphs. Disc. Math., 90(2):169 – 190, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 Y. Wang, H. Wang, A. Mahimkar, R. Alimi, Y. Zhang, L. Qiu, and Y. R. Yang. R3: resilient routing reconfi- guration. ACM SGICOMM CCR, 40(4):291–302, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 D. Xu, Y. Xiong, C. Qiao, and G. Li. Failure protection in layered networks with shared risk link groups. IEEE network, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Local Fast Failover Routing With Low Stretch

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 48, Issue 1
      January 2018
      80 pages
      ISSN:0146-4833
      DOI:10.1145/3211852
      Issue’s Table of Contents

      Copyright © 2018 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 April 2018

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader