Elsevier

Computer Networks

Volume 54, Issue 8, 1 June 2010, Pages 1282-1299
Computer Networks

HiFIND: A high-speed flow-level intrusion detection approach with DoS resiliency

https://doi.org/10.1016/j.comnet.2009.10.016Get rights and content

Abstract

Global-scale attacks like worms and botnets are increasing in frequency, severity and sophistication, making it critical to detect outbursts at routers/gateways instead of end hosts. In this paper, leveraging data streaming techniques such as the reversible sketch, we design HiFIND, a High-speed Flow-level Intrusion Detection system. In contrast to existing intrusion detection systems, HiFIND: (i) is scalable to flow-level detection on high-speed networks; (ii) is DoS resilient; (iii) can distinguish SYN flooding and various port scans (mostly for worm propagation) for effective mitigation; (iv) enables aggregate detection over multiple routers/gateways; and (v) separates anomalies to limit false positives in detection. Both theoretical analysis and evaluation with several router traces show that HiFIND achieves these properties. To the best of our knowledge, HiFIND is the first online DoS resilient flow-level intrusion detection system for high-speed networks (e.g. OC192), even for the worst-case traffic of 40-byte-packet streams with each packet forming a flow.

Introduction

Traffic anomalies and attacks are commonplace in today’s networks, and identifying them rapidly and accurately is critical for operators of large networks. With the rapid growth of network bandwidth and fast emergence of new attacks, viruses and worms, existing network intrusion detection systems (IDS) [1], [2], [3], [4] are insufficient due to lack of the following features.

Most existing IDSes reside on a single host, only examining application-level [1] and system-level [2] logs, as such detailed information can identify attacks on individual machines. However, today’s fast propagating viruses/worms (e.g., SQL Slammer worm) can infect most of the vulnerable machines in the Internet within 10 min [5] or even less than 30 s with some highly virulent techniques [6], [7]. Thus, it is crucial to identify such outbreaks in their early phases, which can only be achieved by detection at high-speed routers instead of at end hosts [8]. Further, the edge network and even the backbone are suggested as good vantage points for worm containment [9]. Worm detection on those high-speed networks is a crucial prerequisite for containment. However, the existing schemes are not scalable to the link speeds and number of flows for high-speed networks.

Given a high-speed router, e.g., an OC-192 link (10 Gbps), each 40-byte TCP packet only has 32 ns to proceed [10]. For data recording in high-speed network IDSes, it is difficult for software-based approaches to keep up with the link speed. Thus, the recording part of high-speed IDSes has to be hardware implementable, and the following three performance features are strongly desirable: (1) a small amount of memory usage (to be implemented in SRAM); (2) a small number of memory accesses per-packet [11], [12]; and (3) scalability to a large key space size. The last constraint is especially important for the coming decade: IPv6 with its 128 bit IP address is being adopted, especially in Asia. Thus, the system should scale to a key space of 2128 or 2256. Meanwhile, other features are strong constraints which exclude many possible designs.

To bypass detection by an IDS, attackers can execute denial-of-service (DoS) attacks, or fool the IDS to raise many false positives so that real attack alerts are ignored. Thus, the attack resiliency of an IDS itself is very important. However, existing IDSes often keep per-flow states for detection, which is vulnerable for DoS attacks.

Accurate attack mitigation usually requires IDSes to pinpoint the attack type and attack flows. To this end, we need to detect intrusions at the flow-level instead of based on the overall traffic [13], [14]. Furthermore, we want to differentiate different types of attacks because mitigation schemes vary with the types of attacks. For example, for SYN flooding, attackers often spoof their IPs, but we can start the SYN defender [15], and/or change the IP address of the given domain name for the victim machines to alleviate the DoS effects. On the other hand, for port scans, we will use an ingress filter to block the traffic from the attackers’ IPs.

Most existing network IDSes assume detection is on a single-router or gateway. However, as multi-homing, load balancing based routing, and policy routing become prevalent, asymmetric routing appears, and ingress and egress traffic go through different routers. Even for a connection between a certain source and destination, the packets may traverse different paths due to per-packet load balancing of routers which use a round-robin method to determine which path each packet takes to the destination [16], [17]. Thus, observation from a single vantage point is often incomplete and affects detection accuracy. Meanwhile, it is very hard to copy all traffic from one-router to other routers/IDSes due to the huge volume of data.

To detect unknown attacks and polymorphic worms, statistics-based instead of signature-based intrusion detections have been adopted widely. However, many network element faults, e.g., congestion/failures, router misconfigurations, and polluted DNS entries, can lead to traffic anomalies which will be detected as attacks.

To meet the requirements above, we propose a new paradigm called DoS resilient High-speed Flow-level INtrusion Detection, HiFIND [18] leveraging recent work on data streaming computation and in particular, sketches [19], [20]. Sketches are a kind of compact data streaming data structure which record traffic for given keys and are capable of reporting heavy traffic keys. Sketches are also linear, meaning we can take linear combinations of multiple sketches. (for details please refer to Section 3). Essentially, we want to detect as many attacks as possible. As the first step towards this ambitious goal, we aim to detect various port scans (port scans are an important way to detect large-scale worm propagation and botnet probing) and TCP SYN flooding. This will serve as an essential building block for the high-speed IDSes. Our goal is to identify and distinguish the port scans and SYN flooding in real-time on high-speed networks, and to obtain the attacks’ key characteristics for mitigation. Note that while each of these attacks seems relatively easy to detect separately, or in an offline setting, it is in fact very hard to detect a mixture of attacks online at flow-level for high-speed networks. To the best of our knowledge, HiFIND is the first DoS resilient high-speed flow-level intrusion detection approach for port scans and TCP SYN flooding for high-speed networks (like 10 s of Gigabit links, e.g., OC192), even for the worst-case traffic of 40-byte-packet streams with each packet forming a flow.

To this end, we leverage and improve sketches, an efficient tool for data streaming computation, to record flow-level traffic as the basis for statistical intrusion detection. Although proposed in [19], [20], sketches have not been applied to building IDSes for the following reasons:

  • Sketches can only record certain aggregated metrics for some given keys. For each flow, there are numerous possible keys: source/destination IP addresses, source/destination ports, source/destination prefixes, protocols, etc., and any of these combinations. Since, it is not feasible to try all possible combinations of the metrics, given the threat model, what would be the minimal set of metrics for monitoring?

  • Existing sketches are all one-dimensional, i.e., they can only record the values for a specific metric. However, various forms of attacks are often hard to identify with such single dimensional information. For example, both horizontal scans and un-spoofed SYN flooding exhibit a large number of unsuccessful connections aggregated with the (source IP, destination port) pair. However, it is difficult to differentiate these different kinds of attacks unless the distribution of the attacks on the destination IPs are also considered.

In this paper, we address these two challenges and build the HiFIND prototype system to meet the five requirements mentioned before. We make the following contributions:

  • We analyze the attributes in TCP/IP headers and select an optimal small set of metrics for flow-level sketch-based traffic monitoring and intrusion detection. Based on that, we build the HiFIND prototype which is DoS resilient and can provide high-speed flow-level intrusion detection online as demonstrated by both analytical and experimental results.

  • To analyze the attack root cause for mitigation, we design efficient two-dimensional (2D) sketches to distinguish different types of attacks. Both analytical and empirical results show the effectiveness of the 2D sketches.

  • We aggregate the compact sketches from multiple vantage points (e.g., routers) to detect intrusion in the face of asymmetric routing and multi-path routing caused by per-packet load balancing of routers. To the best of our knowledge, HiFIND is the first system that can work in such environments.

  • For false positive reduction, we propose several heuristics to separate SYN floodings from network/server congestions and misconfigurations (e.g., polluted DNS entries).

As shown in Fig. 1, HiFIND detection systems can be implemented as black boxes attached to high-speed routers (edge network routers or backbone routers) of ISPs without affecting the normal operation of the routers.

Detection on edge networks is particularly critical, powerful and efficient (without deploying IDSes on all the edge hosts), according to a recent research agenda for large-scale malicious code by DARPA [8].

For evaluation, we first test the router traffic traces collected at Lawrence Berkeley National Labs. We then apply HiFIND for on-site detection at the Northwestern University (NU) edge routers: we record each minute of traffic with reversible sketches on the fly. At the end of each minute, we use the recorded sketches for online detection. In particular, the one day experiment data consist of 239M network flows of 1.8TB total traffic. We validate the SYN flooding and port scans detected, and find the HiFIND system is highly accurate. The 2D sketches successfully separate the SYN flooding from port scans, and the heuristics effectively reduce false positives of SYN flooding. The evaluation demonstrates that HiFIND significantly outperforms existing approaches like Threshold Random Walk (TRW) [21], TRW with approximate caches (TRW-AC) [22], and Change-Point Monitoring (CPM) [13], [14]. Compared with statistical detection on complete flow-level data logs, we have almost the same detection accuracy, but use much less memory.

The HiFIND system runs in real-time, and requires a small number of memory accesses per-packet. With a Pentium Xeon 3.2 GHz machine and normal DRAM memory, we record 239M flows with one reversible sketch in 20.6 s, i.e., 11.6M insertions/second. For the worst-case scenario with all 40-byte packets, this translates to around 3.7 Gbps. Our prototype single FPGA board for reversible sketches can achieve a throughput of over 16 Gbps for all 40-byte-packet streams. For the NU on-site experiments over a total of 1430 min, HiFIND on average uses only 0.34 s to detect intrusions for each minute of traffic, and the standard deviation is 0.64 s.

The organization of this paper is as follows. First, we survey related work in Section 2. In Section 3, we introduce the sketches and reversible sketches as the basis for high-speed network monitoring. We then introduce the HiFIND architecture, discuss the threat model and flow-level detection design in Section 4. The two-dimensional sketches are presented in Section 5, and evaluation methodology and results are in Section 6. Finally, we show the potential limitations of HiFIND in Section 7 and conclude in Section 8.

Section snippets

Related work on intrusion detection systems

Although some vendors claim to have multi-gigabit statistical IDSes (e.g., Arbor Networks’ Peakflow Traffic [23] and Symantec’s Manhunt [24]), they usually refer to average traffic conditions and use packet sampling [25], [26] which has two shortcomings. First, sampling is not scalable, especially after aggregation; there are up to 264 flows defined by source and destination IP addresses. Second, long-lived traffic flows, increasingly prevalent for peer-to-peer applications, will be split up if

k-ary Sketch

There are two key primitives in the analysis of a live network traffic stream: heavy hitter detection and heavy change detection. The former finds flows that constitute more than a given threshold fraction of the total traffic stream. The latter detects flows whose size changes significantly from one stream to another. There is a significant amount of prior work on efficient and online heavy hitter detection [12], [33], [34], [35]. Efficient online heavy change detection, however, remains a

System architecture

Fig. 3 shows the architecture of the HiFIND system. First, we record the network traffic with sketches using the UPDATE function in each router. Based on linearity of the sketches, we summarize the sketches over multiple routers into an aggregate sketch, and apply different time series analysis methods for aggregate sketches to obtain the forecast sketches for change detection by the COMBINE function. The forecast time series analysis method, e.g., EWMA (exponentially weighted moving average)

Intrusion classification with two-dimensional sketch

It is crucial to distinguish different types of attack to take the most effective mitigation scheme. However, one major challenge for intrusion detection is that the traffic anomalies are often multidimensional i.e., they can only be identified when we examine traffic with specific combinations of IP addresses, port numbers, and protocols. For example, if the port distribution of a particular attack is unknown, it becomes very hard to distinguish non-IP-Spoofing SYN flooding attacks from

Evaluation methodology

In this section, we evaluate HiFIND with both simulation and on-site experiment.

  • We use the router traffic traces collected at the Lawrence Berkeley National Laboratory (LBL) for simulation. The one-day trace consists of about 900M netflow records. Unfortunately, the sampling rate is unknown.

  • We apply HiFIND for on-site detection at the Northwestern University (NU, which has several Class B networks) edge routers. The router exports netflow data continuously which is recorded with sketches of

Potential limitations of the HiFIND system

There are roughly two types of stealthy attacks: small rate attacks and slow ramping attacks. Depending on the detection threshold, HiFIND may not be very sensitive to stealthy scans for small rate attacks. On the other hand, small rate attacks in general are not very interesting for high-speed network gateways/routers because a low detection threshold tends to produce large volumes of scan alerts and will overwhelm network administrators. Such attacks usually also have limited effects. To deal

Conclusion

It is crucial to detect the outburst of global-scale attacks at high-speed routers/gateways. In this paper, we propose, implement and evaluate a DoS resilient High-speed Flow-level Intrusion Detection system, HiFIND, leveraging recent data streaming techniques such as reversible sketches. We analyze the TCP/IP headers and select an optimal small set of metrics for monitoring and detection. In addition, we design efficient 2D sketches to distinguish different types of attacks for effective

Zhichun Li is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at Northwestern University. He got his master degree in Computer Science at Tsinghua Univ at 2000. His research interests are on network security, network measurement, and data streaming.

References (53)

  • V. Paxson

    Bro: a system for detecting network intruders in real-time

    Computer Networks

    (1999)
  • T. Ryutov et al.

    Integrated access control and intrusion detection for web servers

    IEEE Transactions on Parallel and Distributed System

    (2003)
  • S. Hofmeyr et al.

    Intrusion detection using sequences of system calls

    Journal of Computer Security

    (1998)
  • M. Roesch, Snort: the lightweight network intrusion detection system, 2001....
  • D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, N. Weaver, The spread of the Sapphire/Slammer worm, 2003....
  • S. Staniford, V. Paxson, N. Weaver, How to own the Internet in your spare time, in: Proceedings of the 11th USENIX...
  • S. Staniford, D. Moore, V. Paxson, N. Weaver, The top speed of flash worms, in: Proceedings of the ACM CCS WORM...
  • N. Weaver, V. Paxson, S. Staniford, R. Cunningham, Large scale malicious code: a research agenda, Tech. Rep....
  • D. Moore, C. Shannon, G.M. Voelker, S. Savage, Internet quarantine: requirements for containing self-propagating code,...
  • S. Sikka, G. Varghese, Memory-efficient state lookups with fast updates, in: Proceedings of the ACM SIGCOMM,...
  • G. Cormode, S. Muthukrishnan, What’s new: finding significant differences in network data streams, in: Proceedings of...
  • C. Estan, G. Varghese, New directions in traffic measurement and accounting, in: Proceedings of the ACM SIGCOMM,...
  • H. Wang, D. Zhang, K.G. Shin, Detecting SYN flooding attacks, in: Proceedings of the IEEE INFOCOM,...
  • H. Wang et al.

    Change-point monitoring for detection of DoS attacks

    IEEE Transactions on Dependable and Secure Computing

    (2004)
  • Checkpoint Software Technologies, TCP Flooding Attack and Firewall-1 SYNDefender....
  • Cisco Inc., Per-Packet Load Balancing, 2003....
  • Cisco Inc., Load balancing with Cisco Express Forwarding, 2003....
  • Y. Gao, Z. Li, Y. Chen, A dos resilient flow-level intrusion detection approach for high-speed networks, in: The...
  • R. Schweller, A. Gupta, E. Parsons, Y. Chen, Reversible sketches for efficient and accurate change detection over...
  • R. Schweller, Z. Li, Y. Chen, Y. Gao, A. Gupta, Y. Zhang, P. Dinda, M. Kao, G. Memik, Reverse hashing for high-speed...
  • J. Jung, V. Paxson, A. Berger, H. Balakrishnan, Fast portscan detection using sequential hypothesis testing, in:...
  • N. Weaver, S. Staniford, V. Paxson, Very fast containment of scanning worms, in: USENIX Security Symposium,...
  • Arbor Networks, Intelligent Network Management with Peakflow Traffic....
  • Symantec Inc., Symantec ManHunt,...
  • N. Duffield, C. Lund, M. Thorup, Properties and prediction of flow statistics from sampled packet streams, in:...
  • N. Duffield, C. Lund, M. Thorup, Flow sampling under hard resource constraints, in: Proceedings of the ACM SIGMETRICS,...
  • Cited by (24)

    • Network traffic fusion and analysis against DDoS flooding attacks with a novel reversible sketch

      2019, Information Fusion
      Citation Excerpt :

      The basic idea is to hash intelligently by modifying the input keys and hashing functions. Li et al. [31] extended the work by designing efficient two-dimensional reversible sketches to distinguish different types of attacks. Salem et al. [32] proposed a detection method for DDoS flooding attacks by integrating multi-stage reversible sketch that proposed in [30].

    • Risk based Security Enforcement in Software Defined Network

      2018, Computers and Security
      Citation Excerpt :

      Gabriel et al. (2014) proposed a technique for handling Distributed Denial of Service (DDoS) attack in an SDN environment, by assessing risk through the means of a cyber-defense system. HiFIND (Li et al., 2010) is a highly secured technique that prevents SDN platform from DDoS attacks for high-density data packets providing protection to customers and service provider. SN-SECurity Architecture (SN-SECA) (Bernardo and Chua, 2015) presents a formal security framework that integrates and validates different security parameters in the SDN/NFV design and implementation.

    • Flow-based intrusion detection: Techniques and challenges

      2017, Computers and Security
      Citation Excerpt :

      The technique is limited to only three anomalies and can be bypassed if the attacker keeps the flow metrics values within range. A high-speed flow level intrusion detection system (HiFIND) is presented in Li et al. (2010). The use of flow information for high speed and DoS resilient intrusion detection was initially proposed in Li et al. (2005) and Gao et al. (2006).

    • Practical real-time intrusion detection using machine learning approaches

      2011, Computer Communications
      Citation Excerpt :

      The similarity ratio represents how close or similar the data is to normal data, i.e. 1.0 means that they are perfectly matched. Li et al. [18] developed a high-speed intrusion detection model using TCP/IP header information. However, their approach is limited to only one type of attack which is DoS.

    View all citing articles on Scopus

    Zhichun Li is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at Northwestern University. He got his master degree in Computer Science at Tsinghua Univ at 2000. His research interests are on network security, network measurement, and data streaming.

    Yan Gao is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at Northwestern University. Her research interests include network security, network measurement, and monitoring. Gao has a BE in electrical engineering and an MS in system engineering, both from Xian Jiaotong University, China.

    Yan Chen is an Assistant Professor in the Department of Electrical Engineering and Computer Science at Northwestern University, Evanston, IL. He got his Ph.D. in Computer Science at the University of California at Berkeley in 2003. His research interests include network measurement, monitoring and security, and P2P systems. He won the DOE Early CAREER award in 2005 and the Microsoft Trustworthy Computing Awards in 2004 and 2005.

    View full text