research-article

Free Access

R2C2: A Network Stack for Rack-scale Computers

Authors:
Paolo Costa

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Hitesh Ballani

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Kaveh Razavi

VU University Amsterdam, Amsterdam, Netherlands

VU University Amsterdam, Amsterdam, Netherlands
View Profile

,
Ian Kash

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data CommunicationAugust 2015Pages 551–564https://doi.org/10.1145/2785956.2787492

Published:17 August 2015Publication History

SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

Pages 551–564

ABSTRACT

Rack-scale computers, comprising a large number of micro-servers connected by a direct-connect topology, are expected to replace servers as the building block in data centers. We focus on the problem of routing and congestion control across the rack's network, and find that high path diversity in rack topologies, in combination with workload diversity across it, means that traditional solutions are inadequate. We introduce R2C2, a network stack for rack-scale computers that provides flexible and efficient routing and congestion control. R2C2 leverages the fact that the scale of rack topologies allows for low-overhead broadcasting to ensure that all nodes in the rack are aware of all network flows. We thus achieve rate-based congestion control without any probing; each node independently determines the sending rate for its flows while respecting the provider's allocation policies. For routing, nodes dynamically choose the routing protocol for each flow in order to maximize overall utility. Through a prototype deployed across a rack emulation platform and a packet-level simulator, we show that R2C2 achieves very low queuing and high throughput for diverse and bursty workloads, and that routing flexibility can provide significant throughput gains.

Supplemental Material

p551-costa.webm

webm

169.2 MB

Download

References

H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic Routing in Future Data Centers. In SIGCOMM, 2010. Google ScholarDigital Library
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010. Google ScholarDigital Library
M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI, 2012. Google ScholarDigital Library
M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal Near-optimal Datacenter Transport. In SIGCOMM, 2013. Google ScholarDigital Library
J. M. andJeff Shamma. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior, 2012.Google Scholar
S. Angel, H. Ballani, T. Karagiannis, G. O'Shea, and E. Thereska. End-to-end Performance Isolation through Virtual Datacenters . In OSDI, 2014. Google ScholarDigital Library
K. Asanovic. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In FAST, 2014. Keynote.Google Scholar
B. Awerbuch, R. Khandekar, and S. Rao. Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework. ACM Trans. Algorithms, 9(1), Dec. 2012. Google ScholarDigital Library
S. Balakrishnan, R. Black, A. Donnelly, P. England, A. Glass, D. Harper, S. Legtchenko, A. Ogus, E. Peterson, and A. Rowstron. Pelican: A Building Block for Exascale Cold Data Storage. In OSDI, 2014. Google ScholarDigital Library
H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards Predictable Datacenter Networks. In SIGCOMM, 2011. Google ScholarDigital Library
H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. O'Shea. Chatty Tenants and the Cloud Network Sharing Problem. In NSDI, 2013. Google ScholarDigital Library
D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, 1987. Google ScholarDigital Library
D. Bertsimas and J. Tsitsiklis. Simulated Annealing. Statistical Science, 8(1), 1993.Google Scholar
R. S. Cahn. Wide Area Network Design: Concepts and Tools for Optimization. Morgan Kaufmann, 1998. Google ScholarDigital Library
M. Chowdhury and I. Stoica. Coflow: A Networking Abstraction for Cluster Applications. In HotNets, 2012. Google ScholarDigital Library
P. Costa, H. Ballani, and D. Narayanan. Rethinking the Network Stack for Rack-scale Computers. In HotCloud, 2014. Google ScholarDigital Library
Cray Inc. Modifying Your Application to Avoid Aries Network Congestion, 2013.Google Scholar
Cray Inc. Network Resiliency for Cray XC30 Systems, 2013.Google Scholar
A. Daglis, S. Novaković, E. Bugnion, B. Falsafi, and B. Grot. Manycore Network Interfaces for In-memory Rack-scale Computing. In ISCA, 2015. Google ScholarDigital Library
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2003. Google ScholarDigital Library
J. Dean and L. A. Barroso. The Tail at Scale. Communications of ACM, 2013. Google ScholarDigital Library
A. A. Dixit, P. Prakash, Y. C. Hu, and R. R. Kompella. On the Impact of Packet Spraying in Data Center Networks. In INFOCOM, 2013.Google ScholarCross Ref
F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In SIGCOMM, 2014. Google ScholarDigital Library
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast Remote Memory. In NSDI, 2014. Google ScholarDigital Library
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009. Google ScholarDigital Library
S. Han, N. Egi, A. Panda, S. Ratnasamy, G. Shi, and S. Shenker. Network Support for Resource Disaggregation in Next-generation Datacenters. In HotNets, 2013. Google ScholarDigital Library
J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975. Google ScholarDigital Library
C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In SIGCOMM, 2012. Google ScholarDigital Library
K. Jang, J. Sherry, H. Ballani, and T. Moncaster. Silo: Predictable Message Latency in the Cloud. In SIGCOMM, 2015. Google ScholarDigital Library
V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical Network Performance Isolation at the Edge. In NSDI, 2013. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA Efficiently for Key-value Services. In SIGCOMM, 2014. Google ScholarDigital Library
S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The nature of data center traffic: measurements & analysis. In IMC, 2009. Google ScholarDigital Library
D. Nace, N.-L. Doan, E. Gourdin, and B. Liau. Computing Optimal Max-min Fair Resource Allocation for Elastic Flows. IEEE/ACM Trans. Netw., 14(6), Dec. 2006. Google ScholarDigital Library
S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ASPLOS, 2014. Google ScholarDigital Library
G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects. In SIGCOMM, 2012. Google ScholarDigital Library
J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A Centralized "Zero-queue" Datacenter Network. In SIGCOMM, 2014. Google ScholarDigital Library
L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the Network in Cloud Computing. In SIGCOMM, 2012. Google ScholarDigital Library
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In ISCA, 2014. Google ScholarDigital Library
S. Radhakrishnan, Y. Geng, V. Jeyakumar, A. Kabbani, G. Porter, and A. Vahdat. SENIC: Scalable NIC for End-Host Rate Limiting. In NSDI, 2014. Google ScholarDigital Library
B. Radunović and J.-Y. L. Boudec. A Unified Framework for Max-min and Min-max Fairness with Applications. IEEE/ACM Trans. Netw., 15(5), Oct. 2007. Google ScholarDigital Library
C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving Datacenter Performance and Robustness with Multipath TCP. In SIGCOMM, 2011. Google ScholarDigital Library
T. Roughgarden and E. Tardos. How Bad is Selfish Routing? J. ACM, 2002. Google ScholarDigital Library
B. Schroeder and G. A. Gibson. Understanding Failures in Petascale Computers. Journal of Physics, 78, 2007.Google Scholar
A. Singh, W. J. Dally, B. Towles, and A. K. Gupta. Locality-preserving Randomized Oblivious Routing on Torus Networks. In SPAA, 2002. Google ScholarDigital Library
L. G. Valiant and G. J. Brebner. Universal Schemes for Parallel Communication. In STOC, 1981. Google ScholarDigital Library
B. Vamanan, J. Hasan, and T. N. Vijaykumar. Deadline-Aware Datacenter TCP (D$^2$TCP). In SIGCOMM, 2012. Google ScholarDigital Library
M. Walraed-Sullivan, J. Padhye, and D. A. Maltz. Theia: Simple and Cheap Networking for Ultra-Dense Data Centers. In HotNets, 2014. Google ScholarDigital Library
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never Than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011. Google ScholarDigital Library
H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang. MDCube: A High Performance Network Structure for Modular Data Center Interconnection. In CoNEXT, 2009. Google ScholarDigital Library
Amazon joins other web giants trying to design its own chips. http://bit.ly/1J5t0fE.Google Scholar
Boston Viridis Data Sheet. http://bit.ly/1fBnsQ9.Google Scholar
Calxeda EnergyCore ECX-1000. http://bit.ly/1nCgdHO.Google Scholar
Design Guide for Photonic Architecture. http://bit.ly/NYpT1h.Google Scholar
Google Ramps Up Chip Design. http://ubm.io/1iQooNe.Google Scholar
How Microsoft Designs its Cloud-Scale Servers. http://bit.ly/1HKCy27.Google Scholar
HP Moonshot System. http://bit.ly/1mZD4yJ.Google Scholar
Intel Atom Processor D510. http://intel.ly/1wJmS3D.Google Scholar
Intel, Facebook Collaborate on Future Data Center Rack Technologies. http://intel.ly/MRpOM0.Google Scholar
Intel Rack Scale Architecture. http://ubm.io/1iejjx5.Google Scholar
Maze: A Rack-scale Computer Emulation Platform. http://aka.ms/maze.Google Scholar
RDMA Aware Networks Programming User Manual. http://bit.ly/1ysVa1O.Google Scholar
SeaMicro SM15000 Fabric Compute Systems. http://bit.ly/1hQepIh.Google Scholar

Index Terms

R2C2: A Network Stack for Rack-scale Computers
1. Networks

Recommendations

R2C2: A Network Stack for Rack-scale Computers
SIGCOMM'15

Rack-scale computers, comprising a large number of micro-servers connected by a direct-connect topology, are expected to replace servers as the building block in data centers. We focus on the problem of routing and congestion control across the rack's ...
Read More
Annulus: A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates
SIGCOMM '20: Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication

Cloud services are deployed in datacenters connected though high-bandwidth Wide Area Networks (WANs). We find that WAN traffic negatively impacts the performance of datacenter traffic, increasing tail latency by 2.5x, despite its small bandwidth demand. ...
Read More
TCP incast solutions in data center networks: A classification and survey
Abstract
In recent years, Data Centers Networks (DCNs) have been deployed to serve as the backbone to support the extensive variety of services offered through the Internet like social networking, web hosting, and e-commerce. The Transmission ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
August 2015
684 pages
ISBN:9781450335423
DOI:10.1145/2785956
General Chairs:
Steve Uhlig
Queen Mary University of London, UK
,
Olaf Maennel
Tallinn U. of Technology in Estonia, Estonia
,
Program Chairs:
Brad Karp
University College London, UK
,
Jitendra Padhye
Microsoft, USA
ACM SIGCOMM Computer Communication Review Volume 45, Issue 4
SIGCOMM'15
October 2015
659 pages
ISSN:0146-4833
DOI:10.1145/2829988
Editors:
Konstantina Papagiannaki
Telefonica Research, Barcelona, Spain
,
Katerina Argyraki
EPFL, Switzerland
,
Hitesh Ballani
Microsoft Research Cambridge, UK
,
Fabián Bustamante
Northwestern University, USA
,
Joseph Camp
SMU, USA
,
Augustin Chaintreau
Columbia University, USA
,
Phillipa Gill
Stony Brook University, USA
,
Marco Mellia
Politecnico di Torino, Italy
,
Bhaskaran Raman
IIT Bombay, India
,
Joel Sommers
Colgate University, USA
,
Aline Carneiro Viana
INRIA, France
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cloud computing
congestion control
data center networks
networks
rack-scale computers
rack-scale network stack
route selection
transport protocols
Qualifiers
- research-article
Conference

Acceptance Rates
SIGCOMM '15 Paper Acceptance Rate40of242submissions,17%Overall Acceptance Rate554of3,547submissions,16%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 655
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

R2C2: A Network Stack for Rack-scale Computers

SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

R2C2: A Network Stack for Rack-scale Computers

Annulus: A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates

TCP incast solutions in data center networks: A classification and survey