ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Performance Evaluation
Volume 64, Issues 9-12, October 2007, Pages 1194-1213
Performance 2007, 26th International Symposium on Computer Performance, Modeling, Measurements, and Evaluation
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (1614 K)

Article Toolbox
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.peva.2007.06.014    
How to Cite or Link Using DOI (Opens New Window)

Copyright © 2007 Published by Elsevier B.V.

Offline/realtime traffic classification using semi-supervised learning

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Jeffrey Ermana, Anirban Mahantib, Corresponding Author Contact Information, E-mail The Corresponding Author, Martin Arlitta, c, Ira Cohenc and Carey Williamsona

aDepartment of Computer Science, University of Calgary, Canada

bDepartment of Computer Science and Engineering, Indian Institute of Technology, Delhi, India

cEnterprise Systems and Software Lab, HP Labs, Palo Alto, USA


Available online 27 June 2007.

Abstract

Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on a network. In this paper, we explore this latter approach and propose a semi-supervised classification method that can accommodate both known and unknown applications. To the best of our knowledge, this is the first work to use semi-supervised learning techniques for the traffic classification problem. Our approach allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows. We consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers. Our performance evaluation using empirical Internet traffic traces that span a 6-month period shows that: (1) high flow and byte classification accuracy (i.e., greater than 90%) can be achieved using training data that consists of a small number of labeled and a large number of unlabeled flows; (2) presence of “mice” and “elephant” flows in the Internet complicates the design of classifiers, especially of those with high byte accuracy, and necessitates the use of weighted sampling techniques to obtain training flows; and (3) retraining of classifiers is necessary only when there are non-transient changes in the network usage characteristics. As a proof of concept, we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach.

Keywords: Internet traffic classification; Realtime classification; Machine learning; Semi-supervised learning

Article Outline

1. Introduction
2. Related work
3. Classification method
3.1. Step 1: Clustering
3.2. Step 2: Mapping clusters to applications
4. Data sets
4.1. Traces and collection methodology
4.2. High-level statistics of the traces
4.3. Methodology for establishing base truth
4.4. Overview of the data sets
4.5. Empirical motivation for our work
5. Offline classification
5.1. Design considerations
5.2. Semi-supervised learning
5.3. The dichotomy of elephant and mice flows
5.4. Feature selection
5.5. Tuning the classifier
6. Realtime classification
6.1. Design considerations
6.2. Classification results
7. Discussion
7.1. The classification arms race
7.2. Longevity
7.3. Retraining
8. Conclusions and future work
Acknowledgements
References
Vitae










Corresponding Author Contact InformationCorresponding author. Tel.: +91 11 2659 7256.

Performance Evaluation
Volume 64, Issues 9-12, October 2007, Pages 1194-1213
Performance 2007, 26th International Symposium on Computer Performance, Modeling, Measurements, and Evaluation
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.