skip to main content
10.1145/2785956.2787492acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

R2C2: A Network Stack for Rack-scale Computers

Published:17 August 2015Publication History

ABSTRACT

Rack-scale computers, comprising a large number of micro-servers connected by a direct-connect topology, are expected to replace servers as the building block in data centers. We focus on the problem of routing and congestion control across the rack's network, and find that high path diversity in rack topologies, in combination with workload diversity across it, means that traditional solutions are inadequate. We introduce R2C2, a network stack for rack-scale computers that provides flexible and efficient routing and congestion control. R2C2 leverages the fact that the scale of rack topologies allows for low-overhead broadcasting to ensure that all nodes in the rack are aware of all network flows. We thus achieve rate-based congestion control without any probing; each node independently determines the sending rate for its flows while respecting the provider's allocation policies. For routing, nodes dynamically choose the routing protocol for each flow in order to maximize overall utility. Through a prototype deployed across a rack emulation platform and a packet-level simulator, we show that R2C2 achieves very low queuing and high throughput for diverse and bursty workloads, and that routing flexibility can provide significant throughput gains.

Skip Supplemental Material Section

Supplemental Material

p551-costa.webm

webm

169.2 MB

References

  1. H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic Routing in Future Data Centers. In SIGCOMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal Near-optimal Datacenter Transport. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. M. andJeff Shamma. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior, 2012.Google ScholarGoogle Scholar
  6. S. Angel, H. Ballani, T. Karagiannis, G. O'Shea, and E. Thereska. End-to-end Performance Isolation through Virtual Datacenters . In OSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Asanovic. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In FAST, 2014. Keynote.Google ScholarGoogle Scholar
  8. B. Awerbuch, R. Khandekar, and S. Rao. Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework. ACM Trans. Algorithms, 9(1), Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Balakrishnan, R. Black, A. Donnelly, P. England, A. Glass, D. Harper, S. Legtchenko, A. Ogus, E. Peterson, and A. Rowstron. Pelican: A Building Block for Exascale Cold Data Storage. In OSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards Predictable Datacenter Networks. In SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. O'Shea. Chatty Tenants and the Cloud Network Sharing Problem. In NSDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Bertsimas and J. Tsitsiklis. Simulated Annealing. Statistical Science, 8(1), 1993.Google ScholarGoogle Scholar
  14. R. S. Cahn. Wide Area Network Design: Concepts and Tools for Optimization. Morgan Kaufmann, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Chowdhury and I. Stoica. Coflow: A Networking Abstraction for Cluster Applications. In HotNets, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Costa, H. Ballani, and D. Narayanan. Rethinking the Network Stack for Rack-scale Computers. In HotCloud, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cray Inc. Modifying Your Application to Avoid Aries Network Congestion, 2013.Google ScholarGoogle Scholar
  18. Cray Inc. Network Resiliency for Cray XC30 Systems, 2013.Google ScholarGoogle Scholar
  19. A. Daglis, S. Novaković, E. Bugnion, B. Falsafi, and B. Grot. Manycore Network Interfaces for In-memory Rack-scale Computing. In ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Dean and L. A. Barroso. The Tail at Scale. Communications of ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. A. Dixit, P. Prakash, Y. C. Hu, and R. R. Kompella. On the Impact of Packet Spraying in Data Center Networks. In INFOCOM, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  23. F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast Remote Memory. In NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Han, N. Egi, A. Panda, S. Ratnasamy, G. Shi, and S. Shenker. Network Support for Resource Disaggregation in Next-generation Datacenters. In HotNets, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Jang, J. Sherry, H. Ballani, and T. Moncaster. Silo: Predictable Message Latency in the Cloud. In SIGCOMM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical Network Performance Isolation at the Edge. In NSDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA Efficiently for Key-value Services. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The nature of data center traffic: measurements & analysis. In IMC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Nace, N.-L. Doan, E. Gourdin, and B. Liau. Computing Optimal Max-min Fair Resource Allocation for Elastic Flows. IEEE/ACM Trans. Netw., 14(6), Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A Centralized "Zero-queue" Datacenter Network. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the Network in Cloud Computing. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Radhakrishnan, Y. Geng, V. Jeyakumar, A. Kabbani, G. Porter, and A. Vahdat. SENIC: Scalable NIC for End-Host Rate Limiting. In NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Radunović and J.-Y. L. Boudec. A Unified Framework for Max-min and Min-max Fairness with Applications. IEEE/ACM Trans. Netw., 15(5), Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving Datacenter Performance and Robustness with Multipath TCP. In SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. Roughgarden and E. Tardos. How Bad is Selfish Routing? J. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. B. Schroeder and G. A. Gibson. Understanding Failures in Petascale Computers. Journal of Physics, 78, 2007.Google ScholarGoogle Scholar
  44. A. Singh, W. J. Dally, B. Towles, and A. K. Gupta. Locality-preserving Randomized Oblivious Routing on Torus Networks. In SPAA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. G. Valiant and G. J. Brebner. Universal Schemes for Parallel Communication. In STOC, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. B. Vamanan, J. Hasan, and T. N. Vijaykumar. Deadline-Aware Datacenter TCP (D$^2$TCP). In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Walraed-Sullivan, J. Padhye, and D. A. Maltz. Theia: Simple and Cheap Networking for Ultra-Dense Data Centers. In HotNets, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never Than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang. MDCube: A High Performance Network Structure for Modular Data Center Interconnection. In CoNEXT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Amazon joins other web giants trying to design its own chips. http://bit.ly/1J5t0fE.Google ScholarGoogle Scholar
  51. Boston Viridis Data Sheet. http://bit.ly/1fBnsQ9.Google ScholarGoogle Scholar
  52. Calxeda EnergyCore ECX-1000. http://bit.ly/1nCgdHO.Google ScholarGoogle Scholar
  53. Design Guide for Photonic Architecture. http://bit.ly/NYpT1h.Google ScholarGoogle Scholar
  54. Google Ramps Up Chip Design. http://ubm.io/1iQooNe.Google ScholarGoogle Scholar
  55. How Microsoft Designs its Cloud-Scale Servers. http://bit.ly/1HKCy27.Google ScholarGoogle Scholar
  56. HP Moonshot System. http://bit.ly/1mZD4yJ.Google ScholarGoogle Scholar
  57. Intel Atom Processor D510. http://intel.ly/1wJmS3D.Google ScholarGoogle Scholar
  58. Intel, Facebook Collaborate on Future Data Center Rack Technologies. http://intel.ly/MRpOM0.Google ScholarGoogle Scholar
  59. Intel Rack Scale Architecture. http://ubm.io/1iejjx5.Google ScholarGoogle Scholar
  60. Maze: A Rack-scale Computer Emulation Platform. http://aka.ms/maze.Google ScholarGoogle Scholar
  61. RDMA Aware Networks Programming User Manual. http://bit.ly/1ysVa1O.Google ScholarGoogle Scholar
  62. SeaMicro SM15000 Fabric Compute Systems. http://bit.ly/1hQepIh.Google ScholarGoogle Scholar

Index Terms

  1. R2C2: A Network Stack for Rack-scale Computers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader