Elsevier

Computer Networks

Volume 53, Issue 3, 27 February 2009, Pages 310-321
Computer Networks

Robust network monitoring in the presence of non-cooperative traffic queries

https://doi.org/10.1016/j.comnet.2008.10.007Get rights and content

Abstract

We present the design of a predictive load shedding scheme for a network monitoring platform that supports multiple and competing traffic queries. The proposed scheme can anticipate overload situations and minimize their impact on the accuracy of the traffic queries. The main novelty of our approach is that it considers queries as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Our system only requires a high-level specification of the accuracy requirements of each query to guide the load shedding procedure and assures a fair allocation of computing resources to queries in a non-cooperative environment. We present an implementation of our load shedding scheme in an existing network monitoring system and evaluate it with a diverse set of traffic queries. Our results show that, with the load shedding mechanism in place, the monitoring system can preserve the accuracy of the queries within predefined error bounds even during extreme overload conditions.

Introduction

The ability to extract detailed information from live traffic streams is critical to network management applications such as traffic engineering, performance analysis and network security. The challenges in this context include the unpredictable nature of network traffic as well as the types of computations, usually unknown in advance, to be performed on the packet streams. At any given point in time, a variable number of applications may need access to the packet stream traversing the same monitoring system.

Recently, several research efforts have proposed network monitoring frameworks that abstract away the low-level details of the network traffic and allow developers to quickly design and implement new methods to process and extract information from packet streams [7], [14]. These systems differ from previous designs in that they are not tailor made for a single specific application, but instead can handle multiple, concurrent monitoring applications.

In an environment where multiple monitoring applications compete for the same shared resources, ensuring fairness of service in the presence of overload is a basic requirement. Load shedding has been recently proposed as an effective alternative to over-provisioning for handling overload situations in several real-time systems [23], [20], [25], [3]. Load shedding is the process of dropping excess load in such a way that the system remains stable and no overflow occurs in the system buffers. Traditionally, load shedding has been implemented by dynamically discarding part of the incoming data in the presence of overload. In this work, we address the problem of how to efficiently and fairly shed excess load from an arbitrary set of monitoring applications while keeping the measurement error within bounds defined by the end users.

There are three main requirements that make this problem particularly challenging. First, the system operates in real-time with live packet streams. Therefore, the load shedding scheme must be lightweight and quickly adapt to sudden overload situations to prevent undesired packet losses. Second, the monitoring applications are unaware of other applications running on the same system and cannot be assumed to behave in a cooperative fashion. Instead, they will always try to obtain the maximum share of the system resources. The system however must ensure fairness of service and avoid starvation of any application, while trying to satisfy their accuracy requirements. Third, to provide developers with maximum flexibility, the system has to support arbitrary monitoring applications for which the resource demands are unknown a priori. In addition, the input data (i.e., the network traffic) is continuous, highly variable and unpredictable in nature. As a consequence, the system cannot make any assumptions about the input traffic nor use any explicit knowledge of the cost of the monitoring applications to decide, for example, when it is the right time to shed load.

To address this third challenge, in a previous work [3] we designed a load shedding scheme that can efficiently handle extreme overload situations, without requiring explicit knowledge of the monitoring applications. The core of our load shedding scheme is based on an on-line prediction model that allows the monitoring system to anticipate future overload situations. It infers the cost of each application from the relationship between its actual resource usage and a large set of simple (and lightweight) traffic features that summarize the incoming traffic (e.g., the number of packets, flows, unique IP destination addresses, etc.). This prediction is then used for gracefully degrading the accuracy of the monitoring applications by deciding, for example, when or how much load to shed using well-known traffic sampling techniques.

In this paper, we extend our previous load shedding scheme to address the problem of where to shed excess load (i.e., the amount of load to be shed in each application), which ensures robustness and fairness of service when dealing with non-cooperative monitoring applications.

In our previous prototype [3], an equal sampling rate is applied to each monitoring application in the presence of overload. Although this solution is fair in the number of packets processed by each application, this paper shows that the system can shed load more effectively by applying different sampling rates to different applications according to external information about their accuracy requirements (e.g., maximum loss the application can tolerate to guarantee a maximum error in the results).

Other strategies used by similar systems to decide where to shed load fall into two broad categories. The first includes solutions that maximize an aggregate performance metric, such as the overall system utility [23] or throughput [1], [22]. We argue that these approaches, when applied to non-cooperative environments, suffer from serious fairness issues and therefore are only suitable for scenarios where the system administrator has complete control over the utility functions or priorities of each application. In this paper, we propose instead a variant of the classical max–min fair share policy that ensures fairness of service even with non-cooperative users. We model our system using game theory and show that it has a single Nash equilibrium when all players provide correct information about their resource requirements. That is, our system has the appealing feature that a user obtains maximum benefit only when providing correct information to the system.

The second category includes solutions that achieve fairness of service by assuring that each application receives an equal share of the system computing resources [16]. In contrast, in this paper we show that, in the context of network monitoring, ensuring fair access to the packet stream can significantly improve the accuracy of monitoring applications. This result indicates that in a scenario where multiple monitoring applications have to run on the same system, a packet-based scheduler can obtain better performance than the Operating System task scheduler, which is basically designed to guarantee fair access to the CPU.

The remainder of this paper is organized as follows. The next two sections review the related work and our load shedding scheme, respectively. We present our method to handle non-cooperative monitoring applications in Section 4 and model it using game theory in Section 5. Section 6 describes our testbed scenario, while Section 7 presents a performance evaluation of an actual implementation. Finally, Section 8 concludes the paper and discusses future work.

Section snippets

Related work

In network monitoring, the simplest form of load shedding consists of discarding packets without control in the presence of overload. This naive approach is still adopted by most monitoring applications, although it is known to have a severe (and unpredictable) impact on the accuracy and effectiveness of these applications.

In order to minimize this impact, critical monitoring systems often integrate specialized hardware (e.g., DAG cards [10]) or make use of ad-hoc configurations to avoid the

Background

We implemented our proposal as an extension to the CoMo monitoring platform [14]. CoMo is an open-source passive network monitoring system that allows for fast implementation and deployment of network monitoring applications. Applications in CoMo (or “modules”1) are written in the C language, making use of a feature-rich API provided by the core platform.

In order to provide the user with maximum

Fairness in a non-cooperative environment

The load shedding strategy described in Section 3 has a major limitation: it does not differentiate among queries, since the load shedder always applies the same sampling rate to each of them. However, the system would make load shedding decisions in a more graceful and intelligent manner if it could consider some additional knowledge about the queries to guide the load shedding procedure, such as their level of tolerance to loss. For example, when using traffic sampling, some queries (e.g.,

System’s Nash equilibrium

To verify that no user has an incentive to provide incorrect mq values, we evaluate our strategy in terms of game theory. In particular, our system can be modeled as a strategic game with Q players, where each player q corresponds to a query. Each player has a set of possible actions that consist of its minimum CPU demands, denoted by aq (i.e., mq×dq^).

Evaluation scenario

In this section, we present the testbed scenario and the set of traffic queries we use to evaluate our load shedding scheme. We also study the tolerance to sampling of each query to select appropriate values for their minimum sampling rate constraints.

Experimental evaluation

In this section, we evaluate our load shedding scheme in the CoMo platform. We study the performance of the two variants of our load shedding scheme, namely max–min fairness in terms of access to the CPU (mmfs_cpu) and in terms of access to the incoming packet stream (mmfs_pkt), with the traffic queries presented in Section 6.

Conclusions and future work

Current network monitoring systems must inevitably deal with the effects of extreme overload situations, due to the large volumes, high data rates and bursty nature of network traffic. This often results in a severe and unpredictable impact on the accuracy and effectiveness of network monitoring applications.

To address this problem, we designed a load shedding scheme that can efficiently handle extreme overload situations by gracefully degrading the accuracy of the traffic queries based on an

Availability

The source code of the load shedding system presented in this paper is publicly available at <http://loadshedding.ccaba.upc.edu>. The CoMo monitoring system is also available at <http://como.sourceforge.net>.

Acknowledgement

This work was funded by a University Research Grant awarded by the Intel Research Council, and by the Spanish Ministry of Education (MEC) under contract TSI2005-07520-C03-02 (CEPOS) and TEC2005-08051-C03-01 (CATARO). The authors thank the Supercomputing Center of Catalonia for allowing them to collect the packet traces used in this work. We are also grateful to Ricard Gavaldà and Albert Bifet from the LSI department of UPC for useful discussion on game theory. The authors also wish to thank the

Pere Barlet-Ros received a M.Sc. degree in Computer Science from the Universitat Politècnica de Catalunya (UPC) in 2003. He is currently an Assistant Professor and Ph.D. candidate at the Computer Architecture Department of UPC. He was also a visiting Ph.D. student at the National Laboratory for Applied Network Research (2004), Intel Research Cambridge (2004) and Berkeley (2007). His research interests are in the fields of network measurements, traffic analysis and evaluation of network

References (28)

  • L. Amini, et al., Adaptive control of extreme-scale stream processing systems, in: Proc. of IEEE Intl. Conf. on...
  • C. Barakat, G. Iannaccone, C. Diot, Ranking flows from sampled traffic, in: Proc. of ACM CoNEXT,...
  • P. Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, D. Amores-López, J. Solé-Pareta, Load shedding in network monitoring...
  • D. Bertsekas et al.

    Data Networks

    (1992)
  • R.S. Boyer, J.S. Moore, A fast string searching algorithm, Commun. ACM 20...
  • Cisco Systems, NetFlow services and applications, White Paper...
  • C. Cranor, et al., Gigascope: a stream database for network applications, in: Proc. of ACM Sigmod,...
  • H. Dreger, A. Feldmann, V. Paxson, R. Sommer, Operational experiences with high-volume network intrusion detection, in:...
  • N. Duffield, Sampling for passive internet measurement: a review, Statistical Science 19...
  • Endace,...
  • C. Estan, K. Keys, D. Moore, G. Varghese, Building a better NetFlow, in: Proc. of ACM Sigcomm,...
  • C. Estan, S. Savage, G. Varghese, Automatically inferring patterns of resource consumption in network traffic, in:...
  • C. Estan, G. Varghese, M. Fisk, Bitmap algorithms for counting active flows on high speed links, in: Proc. of ACM...
  • G. Iannaccone, Fast prototyping of network data mining applications, in: Proc. of Passive and Active Measurement Conf.,...
  • Cited by (0)

    Pere Barlet-Ros received a M.Sc. degree in Computer Science from the Universitat Politècnica de Catalunya (UPC) in 2003. He is currently an Assistant Professor and Ph.D. candidate at the Computer Architecture Department of UPC. He was also a visiting Ph.D. student at the National Laboratory for Applied Network Research (2004), Intel Research Cambridge (2004) and Berkeley (2007). His research interests are in the fields of network measurements, traffic analysis and evaluation of network performance.

    Gianluca Iannaccone received his B.S. and M.S. degrees in computer engineering from the University of Pisa, Italy, in 1998. He received a Ph.D. degree in computer engineering from the University of Pisa in 2002. He joined Sprint as a research scientist in October 2001 working on network performance measurements, loss inference methods, and survivability of IP networks. In September 2003 he joined Intel Research, Cambridge, UK, and recently moved to Intel Research, Berkeley, California. His current interests include network measurements, router architecture design, and network security and troubleshooting.

    Josep Sanjuàs-Cuxart is a Ph.D. student at the Computer Architecture Department of the Universitat Politècnica de Catalunya (UPC), where he received a M.Sc. degree in Computer Science in 2006. He was a visiting student at Intel Research Cambridge for six months in 2006 and currently holds a Projects Scholarship at UPC. His research interests are centered around traffic analysis and the design of network monitoring systems.

    Josep Solé-Pareta obtained his M.Sc. degree in Telecom Engineering in 1984, and his Ph.D. in Computer Science in 1991, both from the UPC. In 1984 he joined the Computer Architecture Department of UPC. Currently he is Full Professor with this department. He did a Postdoc stage (1993 and 1994) at the Georgia Institute of Technology. He is co-founder of the UPC-CCABA. His publications include several book chapters and more than 100 papers in relevant research journals (>20), and refereed international conferences. His current research interests are in Autonomic Communications, Traffic Monitoring and Analysis and High Speed and Optical Networking, with emphasis on traffic engineering, traffic characterization, MAC protocols and QoS provisioning. He has participated in many European projects dealing with Computer Networking topics.

    View full text