Copyright © 2007 Elsevier Inc. All rights reserved.
Received 2 November 2006;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.
Keywords: Data mining; Sequential pattern; Network traffic; Privacy
Article Outline
- 1. Introduction
- 2. Related work
- 3. Problem definition
- 4. Proposed method
- 4.1. Overall mining process
- 4.2. Finding frequent items using N-repository server model
- 4.3. Finding frequent patterns longer than one
- 4.4. Meta tables to quickly determine the occurrence or non-occurrence of candidate patterns
- 4.4.1. Meta tables for storing pairs of items satisfying MaxGap
- 4.4.2. Determining the occurrences or non-occurrences of candidate patterns
- 4.4.3. Meta tables to quickly judge the non-occurrence of a candidate pattern
- 4.5. Discussions
- 4.5.1. Practicality
- 4.5.2. Accuracy
- 4.5.3. Performance
- 5. Performance evaluation
- 5.1. Environment for experiments
- 5.2. Parameter settings
- 5.2.1. Minimum support
- 5.2.2. Maximum time interval
- 5.2.3. Numbers of sites and servers
- 5.3. Analysis of accuracy
- 5.4. Analysis of performance
- 5.5. Size of meta tables
- 6. Conclusions and further study
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (1)







