research-article

Local Fast Failover Routing With Low Stretch

Authors:
Klaus-Tycho Foerster

Aalborg University, Denmark

Aalborg University, Denmark
View Profile

,
Yvonne-Anne Pignolet

ABB Corporate Research, Switzerland

ABB Corporate Research, Switzerland
View Profile

,
Stefan Schmid

University of Vienna, Austria

University of Vienna, Austria
View Profile

,
Gilles Tredan

CNRS-LAAS, France

CNRS-LAAS, France
View Profile

Authors Info & Claims

ACM SIGCOMM Computer Communication Review Volume 48 Issue 1January 2018pp 35–41https://doi.org/10.1145/3211852.3211858

Published:27 April 2018Publication History

ACM SIGCOMM Computer Communication Review

Abstract

Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information.

While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput.

This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.

References

1 M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. ACM SIGCOMM CCR, 38(4):63–74, 2008. Google ScholarDigital Library
2 A. Bhalgat, R. Hariharan, T. Kavitha, and D. Panigrahi. Fast edge splitting and edmonds' arborescence construction for unweighted graphs. In Proc. SODA, 2008. Google ScholarDigital Library
3 M. Borokhovich, L. Schiff, and S. Schmid. Provable data plane connectivity with local fast failover: Introducing openflow graph algorithms. In Proc. ACM SIGCOMM HotSDN, 2014. Google ScholarDigital Library
4 M. Borokhovich and S. Schmid. How (not) to shoot in your foot with sdn local fast failover: A load-connectivity tradeoff. In Proc. OPODIS, 2013. Google ScholarDigital Library
5 M. Chiesa, A. Gurtov, A. Madry, S. Mitrovic, I. Nikolaevkiy, A. Panda, M. Schapira, and S. Shenker. Exploring the limits of static failover routing (v4). arXiv:1409.0034 {cs.NI}, 2016.Google Scholar
6 M. Chiesa, I. Nikolaevskiy, S. Mitrovic, A. V. Gurtov, A. Madry, M. Schapira, and S. Shenker. On the resiliency of static forwarding tables. IEEE/ACM Trans. Netw., 25(2):1133–1146, 2017. Google ScholarDigital Library
7 J. Edmonds. Edge-disjoint branchings. Combinatorial algorithms, 9(91-96):2, 1973.Google Scholar
8 E. Gafni and D. Bertsekas. Distributed algorithms for generating loop-free routes in networks with frequently changing topology. Trans. Commun., 29(1):11–18, 1981.Google ScholarCross Ref
9 P. Gill, N. Jain, and N. Nagappan. Understanding network failures in data centers: measurement, analysis, and implications. SIGCOMM CCR, 41:350–361, 2011. Google ScholarDigital Library
10 C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. Bcube: a high performance, server-centric network architecture for modular data centers. In Proc. ACM SIGCOMM, 2009. Google ScholarDigital Library
11 P. Hall. On representatives of subsets. Journal of the London Mathematical Society, s1-10(1):26–30, 1935.Google Scholar
12 M. Kaufmann and K. Mehlhorn. A linear-time algorithm for the homotopic routing problem in grid graphs. SIAM J. on Computing, 23(2):227–246, 1994. Google ScholarDigital Library
13 V. Liu, D. Halperin, A. Krishnamurthy, and T. E. Anderson. F10: A fault-tolerant engineered network. In Proc. USENIX NSDI, 2013. Google ScholarDigital Library
14 A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and C. Diot. Characterization of failures in an ip backbone. In Proc. IEEE INFOCOM, 2004.Google ScholarCross Ref
15 Y.-A. Pignolet, S. Schmid, and G. Tredan. Load-optimal local fast rerouting for dependable networks. In Proc. DSN, 2017.Google Scholar
16 S. Schmid and J. Srba. Polynomial-time what-if analysis for prefix-manipulating mpls networks. In Proc. IEE INFOCOM, 2018.Google ScholarCross Ref
17 A. Shaikh, C. Isett, A. Greenberg, M. Roughan, and J. Gottlieb. A case study of ospf behavior in a large enterprise network. In Proc. IMW. ACM, 2002. Google ScholarDigital Library
18 B. Stephens, A. L. Cox, and S. Rixner. Plinko: Building provably resilient forwarding tables. In Proc. ACM HotNets, 2013. Google ScholarDigital Library
19 B. Stephens, A. L. Cox, and S. Rixner. Scalable multi-failure fast failover via forwarding table compression. SOSR. ACM, 2016. Google ScholarDigital Library
20 R. Stong. Hamilton decompositions of cartesian products of graphs. Disc. Math., 90(2):169 – 190, 1991. Google ScholarDigital Library
21 Y. Wang, H. Wang, A. Mahimkar, R. Alimi, Y. Zhang, L. Qiu, and Y. R. Yang. R3: resilient routing reconfi- guration. ACM SGICOMM CCR, 40(4):291–302, 2010. Google ScholarDigital Library
22 D. Xu, Y. Xiong, C. Qiao, and G. Li. Failure protection in layered networks with shared risk link groups. IEEE network, 2004. Google ScholarDigital Library

Index Terms

Local Fast Failover Routing With Low Stretch
1. Networks
  1. Network protocols
    1. Network layer protocols
      1. Routing protocols

Recommendations

Improving the Resilience of Fast Failover Routing: TREE (Tree Routing to Extend Edge disjoint paths)
ANCS '21: Proceedings of the Symposium on Architectures for Networking and Communications Systems

Today's communication networks have stringent availability requirements and hence need to rapidly restore connectivity after failures. Modern networks thus implement various forms of fast reroute mechanisms in the data plane, to bridge the gap to slow ...
Read More
Scalable Multi-Failure Fast Failover via Forwarding Table Compression
SOSR '16: Proceedings of the Symposium on SDN Research

In datacenter networks, link and switch failures are a common occurrence. Although most of these failures do not disconnect the underlying topology, they do cause routing failures, disrupting communications between some hosts. Unfortunately, current 1:1 ...
Read More
A Tight Characterization of Fast Failover Routing: Resiliency to Two Link Failures is Possible
SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

To achieve fast recovery from link failures, most modern communication networks feature local fast failover mechanisms in the data plane. These failover mechanisms typically rely on pre-installed static rerouting rules which can depend only on local ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGCOMM Computer Communication Review Volume 48, Issue 1
January 2018
80 pages
ISSN:0146-4833
DOI:10.1145/3211852
Issue’s Table of Contents

Copyright © 2018 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 April 2018
Check for updates
Author Tags
Fast Reroute
Network Algorithms
Static Resiliency
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 178
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Local Fast Failover Routing With Low Stretch

ACM SIGCOMM Computer Communication Review

Abstract

References

Cited By

Index Terms

Recommendations

Improving the Resilience of Fast Failover Routing: TREE (Tree Routing to Extend Edge disjoint paths)

Scalable Multi-Failure Fast Failover via Forwarding Table Compression

A Tight Characterization of Fast Failover Routing: Resiliency to Two Link Failures is Possible

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Local Fast Failover Routing With Low Stretch

ACM SIGCOMM Computer Communication Review

Abstract

References

Cited By

Index Terms

Recommendations

Improving the Resilience of Fast Failover Routing: TREE (Tree Routing to Extend Edge disjoint paths)

Scalable Multi-Failure Fast Failover via Forwarding Table Compression

A Tight Characterization of Fast Failover Routing: Resiliency to Two Link Failures is Possible

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media