Skip to main content

A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multi-partition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches. We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128–137 (2004)

    Google Scholar 

  2. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. ACM SIGKDD, Boston, MA, USA, pp. 71–80. ACM Press, New York (2000)

    Google Scholar 

  3. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. SIGKDD, Washington, DC, USA, pp. 226–235 (2003)

    Google Scholar 

  4. Scholz, M., Klinkenberg., R.: An ensemble classifier for drifting concepts. In: Proc. Second International Workshop on Knowledge Discovery in Data Streams (IWKDDS), Porto, Portugal, pp. 53–64 (2005)

    Google Scholar 

  5. Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: Boat–optimistic decision tree construction. In: Proc. ACM SIGMOD, Philadelphia, PA, USA, pp. 169–180 (1999)

    Google Scholar 

  6. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. ACM SIGKDD, San Francisco, CA, USA, pp. 97–106 (2001)

    Google Scholar 

  7. Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)

    Article  Google Scholar 

  8. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. International Conference on Machine Learning (ICML), Bari, Italy, pp. 148–156 (1996)

    Google Scholar 

  9. Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proc. International conference on Machine learning (ICML), Bonn, Germany, pp. 449–456 (2005)

    Google Scholar 

  10. Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams. In: Proc. IEEE International Conference on Data Mining (ICDM), Omaha, NE, USA, pp. 143–152 (2007)

    Google Scholar 

  11. Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connection Science 8(304), 385–403 (1996)

    Article  Google Scholar 

  12. Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Mining concept-drifting data stream to detect peer to peer botnet traffic. Univ. of Texas at Dallas Tech. Report# UTDCS-05-08 (2008), http://www.utdallas.edu/~mmm058000/reports/UTDCS-05-08.pdf

  13. Barford, P., Yegneswaran, V.: An Inside Look at Botnets. In: Advances in Information Security. Springer, Heidelberg (2006)

    Google Scholar 

  14. Ferguson, T.: Botnets threaten the internet as we know it. ZDNet Australia (April 2008)

    Google Scholar 

  15. Lemos, R.: Bot software looks to improve peerage (2006), http://www.securityfocus.com/news/11390

  16. Group, L.T.I.: Sinit p2p trojan analysis. lurhq (2004), http://www.lurhq.com/sinit.html

  17. Grizzard, J.B., Sharma, V., Nunnery, C., Kang, B.B., Dagon, D.: Peer-to-peer botnets: Overview and case study. In: Proc. 1st Workshop on Hot Topics in Understanding Botnets, p. 1 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B. (2009). A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics