Elsevier

Computer Networks

Volume 55, Issue 9, 23 June 2011, Pages 2196-2208
Computer Networks

Understanding data center network architectures in virtualized environments: A view from multi-tier applications

https://doi.org/10.1016/j.comnet.2011.03.001Get rights and content

Abstract

In recent years, data center network (DCN) architectures (e.g., DCell [11], FiConn [14], BCube [10], FatTree [1], and VL2 [8]) received a surge of interest from both the industry and academia. However, evaluation of these newly proposed DCN architectures is limited to MapReduce or scientific computing type of traffic patterns, and none of them provides an in-depth understanding of their performance in conventional transaction systems under realistic workloads. Moreover, it is also unclear how these architectures are affected in virtualized environments. In this paper, we fill this void by conducting an experimental evaluation of FiConn and FatTree, each respectively as a representative of hierarchical and flat architectures, in a clustered three-tier transaction system using a virtualized deployment. We evaluate these two architectures from the perspective of application performance and explicitly consider the impact of server virtualization. Our experiments are conducted in two major testing scenarios, service fragmentation and failure resilience, from which we observe several fundamental characteristics that are embedded in both classes of network topologies and cast a new light on the implication of virtualization in DCN architectures. Issues observed in this paper are generic and should be properly considered in any DCN design before the actual deployment, especially in mission-critical real-time transaction systems.

Introduction

Driven by the recent proliferation of Cloud services and trend of consolidating enterprise IT systems, data centers are experiencing a rapid growth in both scale and complexity. At the same time, it becomes widely recognized that traditional tree-like structures of data center networks (DCN) encounter a variety of challenges, such as limited server to server connectivity, vulnerability to single point of failure, lack of agility, insufficient scalability, and resource fragmentation [8]. To address these problems, in the recent two years several network architectures [1], [8], [10], [11], [14], [16] have been proposed for large-scale data centers and gained a significant amount of attention from both industry practitioners and the research community.

In general, all existing DCN architectures can be classified into two categories, hierarchical and flat. The former is represented by the conventional tree topology (followed by recent proposals such as DCell [11], FiConn [14], and BCube [10]) and employs a layered structure that is constructed recursively from lower-level components. The latter (e.g., FatTree [1] and VL2 [8]) on the contrary organizes all servers into the same level and interconnects L2 switches using certain topologies (e.g., the Clos networks [5]). All these newly proposed DCN architectures are shown to significantly improve scalability, failure resilience, and aggregate network bandwidth over the conventional tree structure. However, all existing work assumes general-purpose underlying systems and does not explicitly consider performance of complex applications or interactions between different application components. In contrast, data centers may have very different traffic patterns based on the applications running on the top, e.g., searching/indexing, media streaming, cloud computing, or transaction systems. Moreover, there does not exist an experimental comparison of these DCN architectures for a better understanding of their characteristics in a common physical setting. Finally, an in-depth study of the impact of server virtualization on DCN architectures is also missing in the current picture.

In this paper, we seek to fill the void by conducting an experimental evaluation of two DCN architectures, FiConn and FatTree (which are respectively a representative of the hierarchical and flat architectures, see Section 2), in a conventional mission-critical multi-tier transaction system. We first implement FiConn and FatTree in a fully virtualized testbed and then justify correctness of our implementation by comparing the experimental results to those obtained from a non-virtualized testbed. We then deploy RUBiS [21], a clustered three-tier eBay-like on-line auction system, on both FiConn and FatTree and examine application performance in different scenarios.

We focus on two test cases, service fragmentation and failure resilience. In the study of service fragmentation, we mix all servers together instead of grouping servers (i.e., VM’s in our case) belonging to the same service tier into the same physical machine (PM). The reason is twofold. On one hand, this allows sufficient network traffic to be injected into the network, thereby enabling us to study networking properties of the underlaying DCN architectures. On the other hand, this way, we are also able to examine, when service tiers are fragmented into various network locations, how application performance is affected by DCN architectures. Particularly, our results suggest that FiConn is subject to performance degradation under service fragmentation (especially when CPU- and network-intensive components are collocated on the same PM) due to its server-centric design, while FatTree demonstrates higher resilience to this situation as a result of its flat server layout. The second testing scenario is failure resilience, where we examine performance of FiConn and FatTree after certain servers are down. Since both FiConn and FatTree have multiple routing paths for each pair of nodes, traffic is rerouted through other paths upon server failures. Our results show that application performance is more subject to degradation in FiConn than in FatTree due to FiConn’s interference issue between routing and computing. However, FatTree’s network-centric design does not come without expense. For instance, FatTree exhibits inefficient resource utilization in certain cases as a result of its routing behavior.

The main contributions of this paper can be summarized into the following three aspects.

  • First, we perform an experimental comparison of newly proposed DCN architectures in a common system setting.

  • Second, this paper employs a fully virtualized implementation and explicitly examines the impact of server virtualization on DCN architectures.

  • Third, all experiments are performed in a cluster-based three-tier system where application performance (e.g., request throughput and response latency) is the major measurement metric.

At the time of this writing, we believe that none of the above three points have been properly studied in existing literature and this paper is the first to address all of them simultaneously. Even though our experiments focus on FiConn and FatTree, most of our results can be generalized and applied as guidelines for designing any DCN architectures purported to be deployed in modern virtualized data centers hosting mission-critical real-time transaction systems. However, we understand that this work is by no means a comprehensive experimental study of all existing DCN architectures. Instead, we encourage readers to consider it as an initial step towards a deeper understanding of these architectures in a more realistic environment.

The rest of the paper is structured as follows. In Section 2, we provide a brief review on existing data center architectures. In Section 3, we discuss the motivation for conducting this work and goals we seek to achieve in the paper. In Section 4, we provide details of system setup for our experiments. In Section 5, we present the experimental results and our main findings. Finally, in Section 6, we conclude this paper and point out directions for future work.

Section snippets

Background

In this section, we briefly review the conventional and newly proposed data center network architectures. Based on how the system is constructed, we can classify existing DCN architectures into two categories, hierarchical and flat.

Motivation

The data center market is currently experiencing an exponential growth [6] due to consolidation of enterprise IT systems and proliferation of Cloud services (e.g., Amazon Web Services [3], Google Apps [7], and Microsoft Azure [15]). The core technology that is driving this trend is virtualization due to its appealing cost reduction capability, high agility, excellent flexibility, etc. On the other hand, the cost benefit and operational efficiency brought by data center consolidation in turn

System architecture

We start with an introduction of our system architecture and implementation methodologies.

System validation

Before conducting further evaluation, we first need to ensure that our virtualized implementation does not introduce undesirable artifacts and is able to faithfully reveal characteristics of the original DCN architectures. Towards this end, we implement FiConn and FatTree in a separate non-virtualized testbed. Specifically, our non-virtualized implementation of FiConn exactly follows the settings illustrated in Fig. 2(a) where each server corresponds to an actual physical machine. Similarly,

Conclusion

In this paper we conducted an experimental evaluation of FiConn and FatTree in a virtualized environment in the context of a multi-tier transaction system. Even though it is still unclear which class of DCN architectures, hierarchical or flat, will prevail, there are certain fundamental issues that both classes should properly address. On one hand, the concept of locality is naturally embedded in hierarchical architectures in that servers belonging to the same-level component (e.g., FiConn0[0]

Yueping Zhang received a B.S. degree in Computer Science from Beijing University of Aeronautics and Astronautics, Beijing, China, in 2001, and Ph.D. degree in Computer Engineering from Texas A&M University, College Station, USA, in 2008. Since then, he has been a Research Staff Member at NEC Laboratories America at Princeton, New Jersey. His research interests includes Internet congestion control, delayed stability analysis, network management, and data center networks.

References (25)

  • M. Al-Fares, A. Loukasass, A. Vahdat, A scalable commodity data center network architecture, in: Proceedings of the ACM...
  • Apache....
  • Amazon Web Services (AWS)....
  • J. Brodkin, Half of New Servers are Virtualized, Survey Finds....
  • W.J. Dally et al.

    Principles and Practices of Interconnection Networks

    (2004)
  • Forrester Consulting, How Server and Network Virtualization Make Data Centers More Dynamic....
  • Google Apps....
  • A. Greenberg, J.R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D.A. Maltz, P. Patel, S. Sengupta, VL2: a scalable...
  • A. Greenberg, P. Lahiri, D.A. Maltz, P. Patel, S. Sengupta, Towards a next generation data center architecture:...
  • C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, S. Lu, BCube: a high performance, server-centric...
  • C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, DCell: a scalable and fault-tolerant network structure for data...
  • HAProxy: The Reliable, High Performance TCP/HTTP Load Balancer....
  • Cited by (19)

    • VMPlanner: Optimizing virtual machine placement and traffic flow routing to reduce network power costs in cloud data centers

      2013, Computer Networks
      Citation Excerpt :

      The topology scaling limitations [12] as well as other problems [17] such as limited inter-sever capacity, fragmentation of resources and poor resource utilization, have prompted recently many parallel efforts in redefining the network architecture of the data centers. Most new architecture designs [18], e.g., VL2 [17] and Fat-Tree [19], share similar richly-connected topologies (typically a form of Clos network [20]), differing more on how addressing and routing are implemented. Greening data centers is becoming an increasingly important topic for cloud service providers in recent years, due to the high operational expenditure for power consumption as well as the concerns for global warming and energy crisis.

    • Data Center Networks: The Evolving Scenario

      2021, Smart Innovation, Systems and Technologies
    • Airfoil: A Topology Aware Distributed Load Balancing Service

      2015, Proceedings - 2015 IEEE 8th International Conference on Cloud Computing, CLOUD 2015
    View all citing articles on Scopus

    Yueping Zhang received a B.S. degree in Computer Science from Beijing University of Aeronautics and Astronautics, Beijing, China, in 2001, and Ph.D. degree in Computer Engineering from Texas A&M University, College Station, USA, in 2008. Since then, he has been a Research Staff Member at NEC Laboratories America at Princeton, New Jersey. His research interests includes Internet congestion control, delayed stability analysis, network management, and data center networks.

    Ao-Jan Su is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at Northwestern University. Before joining Northwestern University in 2005, he has respectively been a software architect at Yahoo! and core developer of NCTUns Network Simulator at Academia Sinica in Taiwan. His research interests include Internet measurements, content distribution networks, data center networks, and denial-of-service resiliency.

    Guofei Jiang is a Department Head with the Robust and Secure Systems Department in NEC Laboratories America (NECLA) at Princeton, New Jersey. He leads a dozen of researchers focusing on fundamental and applied research in the areas of distributed system and networks, autonomic system management, system and software reliability, machine learning and data mining, and system and information theory. Dr. Jiang was an associate editor of IEEE Security and Privacy magazine and has also served in the program committees of many prestigious conferences. He has published nearly 90 technical papers and also has over 20 patents granted or applied. His inventions have been successfully commercialized as NEC products and have significantly contributed to NEC business.

    A shorter version [25] of this paper appeared in IEEE IWQoS 2010.

    View full text