Abstract
We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multi-partition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches. We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128–137 (2004)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. ACM SIGKDD, Boston, MA, USA, pp. 71–80. ACM Press, New York (2000)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. SIGKDD, Washington, DC, USA, pp. 226–235 (2003)
Scholz, M., Klinkenberg., R.: An ensemble classifier for drifting concepts. In: Proc. Second International Workshop on Knowledge Discovery in Data Streams (IWKDDS), Porto, Portugal, pp. 53–64 (2005)
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: Boat–optimistic decision tree construction. In: Proc. ACM SIGMOD, Philadelphia, PA, USA, pp. 169–180 (1999)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. ACM SIGKDD, San Francisco, CA, USA, pp. 97–106 (2001)
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. International Conference on Machine Learning (ICML), Bari, Italy, pp. 148–156 (1996)
Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proc. International conference on Machine learning (ICML), Bonn, Germany, pp. 449–456 (2005)
Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams. In: Proc. IEEE International Conference on Data Mining (ICDM), Omaha, NE, USA, pp. 143–152 (2007)
Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connection Science 8(304), 385–403 (1996)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Mining concept-drifting data stream to detect peer to peer botnet traffic. Univ. of Texas at Dallas Tech. Report# UTDCS-05-08 (2008), http://www.utdallas.edu/~mmm058000/reports/UTDCS-05-08.pdf
Barford, P., Yegneswaran, V.: An Inside Look at Botnets. In: Advances in Information Security. Springer, Heidelberg (2006)
Ferguson, T.: Botnets threaten the internet as we know it. ZDNet Australia (April 2008)
Lemos, R.: Bot software looks to improve peerage (2006), http://www.securityfocus.com/news/11390
Group, L.T.I.: Sinit p2p trojan analysis. lurhq (2004), http://www.lurhq.com/sinit.html
Grizzard, J.B., Sharma, V., Nunnery, C., Kang, B.B., Dagon, D.: Peer-to-peer botnets: Overview and case study. In: Proc. 1st Workshop on Hot Topics in Understanding Botnets, p. 1 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B. (2009). A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)