ABSTRACT
Learning from data streams is a hot topic in machine learning that targets the learning and update of predictive models as data becomes available for both training and query. Due to their simplicity and convincing results in a multitude of applications, Hoeffding Trees are, by far, the most widely used family of methods for learning decision trees from streaming data. Despite the aforementioned positive characteristics, Hoeffding Trees tend to continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and multiple times on the same feature; thus leading to unnecessary complexity. With this behavior, Hoeffding Trees lose the ability to be human-understandable and computationally efficient. To tackle these issues, we propose a regularization scheme for Hoeffding Trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far; and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. The proposed scheme is combined with both standard and adaptive variants of Hoeffding Trees. Experiments using real-world, stationary and drifting synthetic data show that the proposed method prevents both original and adaptive Hoeffding Trees from unnecessarily growing while maintaining impressive accuracy rates. As a byproduct of the regularization process, significant improvements in processing time, model complexity, and memory consumption have also been observed, thus showing the effectiveness of the proposed regularization scheme.
- R. Agrawal, T.Imielinski, and Arun Swami. 1993. Database mining: a performance perspective. Knowledge and Data Engineering, IEEE Transactions on 5, 6 (Dec 1993), 914--925. Google ScholarDigital Library
- Geoffrey Holmes Albert Bifet, Eibe Frank and Bernhard Pfahringer (Eds.). 2010. Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking. JMLR Proceedings, Vol. 13. JMLR.org.Google Scholar
- Jean Paul Barddal, Heitor Murilo Gomes, FabrÃěgcio Enembreck, and Bernhard Pfahringer. 2017. A survey on feature drift adaptation: Definition, benchmark, challenges and future directions. Journal of Systems and Software 127 (2017), 278 -- 294. Google ScholarDigital Library
- Jean Paul Barddal, Heitor Murilo Gomes, Fabrício Enembreck, Bernhard Pfahringer, and Albert Bifet. 2016. On Dynamic Feature Weighting for Feature Drifting Data Streams. In ECML/PKDD' 16 (Lecture Notes in Computer Science). Springer.Google Scholar
- Andrew R Barron, Jorma Rissanen, and Bin Yu. 1998. The Minimum Description Length Principle in Coding and Modeling. IEEE Trans. Inf. Theory 44, 6 (1998), 2743--2760. http://dblp.uni-trier.de/rec/bibtex/journals/tit/BarronRY98 Google ScholarDigital Library
- Albert Bifet, Eibe Frank, Geoff Holmes, and Bernhard Pfahringer. 2012. Ensembles of Restricted Hoeffding Trees. ACM Trans. Intell. Syst. Technol. 3, 2, Article 30 (Feb. 2012), 20 pages. Google ScholarDigital Library
- Albert Bifet and Ricard Gavaldà. 2007. Learning from time-changing data with adaptive windowing. In In SIAM International Conference on Data Mining.Google ScholarCross Ref
- Albert Bifet and Ricard Gavaldà. 2009. Adaptive Learning from Evolving Data Streams. Springer Berlin Heidelberg, Berlin, Heidelberg, 249--260. Google ScholarDigital Library
- Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive Online Analysis. The Journal of Machine Learning Research 11 (2010), 1601--1604. Google ScholarDigital Library
- Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2010. Leveraging Bagging for Evolving Data Streams. In Machine Learning and Knowledge Discovery in Databases, JosÃl' Luis BalcÃązar, Francesco Bonchi, Aristides Gionis, and MichÃĺle Sebag (Eds.). Lecture Notes in Computer Science, Vol. 6321. Springer Berlin Heidelberg, 135--150. Google ScholarDigital Library
- Jock A. Blackard and Denis J. Dean. 1999. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture 24, 3 (1999), 131 -- 151.Google ScholarCross Ref
- Janez Demsar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7 (Dec. 2006), 1--30. http://dl.acm.org/citation.cfm?id=1248547.1248548 Google ScholarDigital Library
- Houtao Deng and G. Runger. 2012. Feature selection via regularized trees. In The 2012 International Joint Conference on Neural Networks (IJCNN). 1--8.Google Scholar
- Pedro Domingos and Geoff Hulten. 2000. Mining High-speed Data Streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '00). ACM, New York, NY, USA, 71--80. Google ScholarDigital Library
- Joao Gama. 2010. Knowledge Discovery from Data Streams (1st ed.). Chapman & Hall/CRC. Google ScholarDigital Library
- J. Gama and P. Rodrigues. 2009. Issues in evaluation of stream learning algorithms. In Proc. of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM SIGKDD, 329--338. Google ScholarDigital Library
- Mark A. Hall and Lloyd A. Smith. 1999. Feature Selection for Machine Learning: Comparing a Correlation-based Filter Approach to the Wrapper. (1999).Google ScholarDigital Library
- Geoff Hulten, Laurie Spencer, and Pedro Domingos. 2001. Mining Time-changing Data Streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01). ACM, New York, NY, USA, 97--106. Google ScholarDigital Library
- Elena Ikonomovska, João Gama, Bernard Zenko, and Saso Dzeroski. 2011. Speeding-Up Hoeffding-Based Regression Trees With Options. In ICML. 537--544. Google ScholarDigital Library
- Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2006. Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams. In in ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams. 2006. Springer Verlag, 107.Google Scholar
- Hai-Long Nguyen, Yew-Kwong Woon, Wee-Keong Ng, and Li Wan. 2012. Heterogeneous Ensemble for Feature Drifts in Data Streams. In Advances in Knowledge Discovery and Data Mining, Pang-Ning Tan, Sanjay Chawla, ChinKuan Ho, and James Bailey (Eds.). Lecture Notes in Computer Science, Vol. 7302. Springer Berlin Heidelberg, 1--12. Google ScholarDigital Library
- W. Nick Street and Y. Kim. 2001. A streaming ensemble algorithm (SEA) for large-classification. In Proc. of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM SIGKDD, 377--382. Google ScholarDigital Library
- Robert Tibshirani. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 1 (1996), 267--288. http://www.jstor.org/stable/2346178Google ScholarCross Ref
- Geoffrey I. Webb, Loong Kuan Lee, Bart Goethals, and François Petitjean. 2018. Analyzing concept drift and shift from sample data. Data Mining and Knowledge Discovery (12 Mar 2018). Google ScholarDigital Library
- Gerhard Widmer and Miroslav Kubat. 1996. Learning in the Presence of Concept Drift and Hidden Contexts. Mach. Learn. 23, 1 (April 1996), 69--101. Google ScholarDigital Library
- H. Yang and S. Fong. 2011. Optimized very fast decision tree with balanced classification accuracy and compact tree size. In The 3rd International Conference on Data Mining and Intelligent Information Technology Applications. 57--64.Google Scholar
Index Terms
- Learning regularized hoeffding trees from data streams
Recommendations
Learning Higher Accuracy Decision Trees from Concept Drifting Data Streams
IEA/AIE '08: Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial IntelligenceIn this paper, we propose to combine the naive-Bayes approach with CVFDT, which is known as one of the major algorithms to induce a high-accuracy decision tree from time-changing data streams. The proposed improvement, called CVFDT<Subscript>NBC</...
Learning model trees from evolving data streams
The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, ...
Decision trees for mining data streams
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining ...
Comments