Processing data stream with chunk-similarity model selection

Ksieniewicz, Pawel

doi:10.1007/s10489-022-03826-4

Processing data stream with chunk-similarity model selection

Published: 28 July 2022

Volume 53, pages 7931–7956, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Pawel Ksieniewicz ORCID: orcid.org/0000-0001-9578-8395¹

258 Accesses
2 Citations
Explore all metrics

Abstract

The classification of data stream susceptible to the concept drift phenomenon has been a field of intensive research for many years. One of the dominant strategies of the proposed solutions is the application of classifier ensembles with the member classifiers validated on their actual prediction quality. This paper is a proposal of a new ensemble method – Covariance-signature Concept Selector – which, like state-of-the-art solutions, uses both the model accumulation paradigm and the detection of changes in the data posterior probability, but in the integrated procedure. However, instead of ensemble fusion, it performs a static classifier selection, where model similarity assessment to the currently processed data chunk serves as a concept selector. The proposed method was subjected to a series of computer experiments assessing its temporal complexity and efficiency in classifying streams with synthetic and real concepts. The conducted experimental analysis allows concluding the advantage of this proposal over state-of-the-art methods in the identified pool of problems and high potential in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

On Ensemble Components Selection in Data Streams Scenario with Gradual Concept-Drift

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

Article 28 April 2015

Mohammad Javad Hosseini, Ameneh Gholipour & Hamid Beigy

Data Streams Classification: A Selective Ensemble with Adaptive Behavior

Notes

References

Alpaydin E (2020) Introduction to machine learning. MIT press
Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, Fu Y (2019) Large scale incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 374–382
Köppen M (2000) The curse of dimensionality. In: 5th Online World conference on soft computing in industrial applications (WSC5), vol 1, pp 4–8
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and information conference. IEEE, pp 372–378
Ienco D, Bifet A, žliobaitė I, Pfahringer B (2013) Clustering based active learning for evolving data streams. In: International conference on discovery science. Springer, pp 79–93
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel C (2019) Mixmatch: A holistic approach to semi-supervised learning. arXiv:1905.02249
Zhou L, Pan S, Wang J, Vasilakos A V (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361
Article Google Scholar
žliobaitė I (2010) Learning under concept drift: an overview. arXiv:1010.4784
Gaber M M, Zaslavsky A, Krishnaswamy S (2007) A survey of classification methods in data streams. Data Streams, 39–59
Sobolewski P, Woźniak M (2013) Comparable study of statistical tests for virtual concept drift detection. In: Proceedings of the 8th international conference on computer recognition systems CORES 2013. Springer, pp 329–337
Ksieniewicz P (2021) The prior probability in the batch classification of imbalanced data streams. Neurocomputing 452:309–316
Article Google Scholar
Komorniczak J, Zyblewski P, Ksieniewicz P (2021) Prior probability estimation in dynamically imbalanced data streams
Grzyb J, Klikowski J, Woźniak M (2021) Hellinger distance weighted ensemble for imbalanced data stream classification. J Comput Sci 51:101314
Article Google Scholar
Ghazikhani A, Monsefi R, Yazdi H S (2013) Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evolv Syst 4(2):119–131
Article Google Scholar
Zyblewski P, Sabourin R, Woźniak M (2019) Data preprocessing and dynamic ensemble selection for imbalanced data stream classification. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 367–379
Gama J (2012) A survey on learning from data streams: current and future trends. Progress Artif Intell 1(1):45–55
Article Google Scholar
Manoj Kumar MV, Thomas L, Annappa B (2015) Capturing the sudden concept drift in process mining. Algorithms & theories for the analysis of event data (ATAED’15, Brussels, Belgium, June 22-23, 2015), p 132
Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94
Article Google Scholar
Liu A, Zhang G, Lu J (2017) Fuzzy time windowing for gradual concept drift adaptation. In: 2017 IEEE International conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 1–6
Krawczyk B, Woźniak M (2015) One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Comput 19(12):3387–3400
Article Google Scholar
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239:39–57
Article Google Scholar
Krawczyk B, Minku L L, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inform Fus 37:132–156
Article Google Scholar
Kuncheva L I (2004) Classifier ensembles for changing environments. In: International workshop on multiple classifier systems. Springer, pp 1–15
Street W N, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382
Wang H, Fan W, Yu P S, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235
Brzeziński D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: International conference on hybrid artificial intelligence systems. Springer, pp 155–163
Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolv Syst 2(1):35–50
Article Google Scholar
Woźniak M, Kasprzak A, Cal P (2013) Weighted aging classifier ensemble for the incremental drifted data streams. In: International conference on flexible query answering systems. Springer, pp 579–588
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. Amer. ž Statist Assoc J, 1329
Muallem A, Shetty S, Pan J W, Zhao J, Biswal B (2017) Hoeffding tree algorithms for anomaly detection in streaming datasets: a survey. J Inf Secur 8:4
Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 135–150
Oza N C, Russell S J (2001) Online bagging and boosting. In: International workshop on artificial intelligence and statistics. PMLR, pp 229–236
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448
Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9):1469–1495
Article MathSciNet Google Scholar
Cano A, Krawczyk B (2020) Kappa updated ensemble for drifting data stream mining. Mach Learn 109(1):175–218
Article MathSciNet MATH Google Scholar
Gonçalves Jr P M, de Carvalho Santos Silas GT, Barros Roberto SM, Vieira Davi CL (2014) A comparative study on concept drift detectors. Expert Syst Appl 41(18):8144–8156
Article Google Scholar
Barros R S M, Santos S G T C (2018) A large-scale comparison of concept drift detectors. Inf Sci 451:348–370
Article MathSciNet Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86
Page E S (1954) Continuous inspection schemes. Biometrika 41(1/2):100–115
Article MathSciNet MATH Google Scholar
Alippi C, Roveri M (2006) An adaptive cusum-based test for signal change detection. In: 2006 IEEE international symposium on circuits and systems. IEEE, pp 4–pp
Severo M, Gama J (2006) Change detection with Kalman filter and cusum. In: International conference on discovery science. Springer, pp 243–254
Srivastava MS, Wu Y (1993) Comparison of Ewma, Cusum and Shiryayev-Roberts procedures for detecting a shift in the mean. Ann Stat, 645–670
Micevska S, Awad A, Sakr S (2021) Sddm: an interpretable statistical concept drift detection method for data streams. J Intell Inform Syst 56(3):459–484
Article Google Scholar
Bach S H, Maloof M A (2008) Paired learners for concept drift. In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 23–32
Bose A, Bhattacharjee M (2018) Large covariance and autocovariance matrices. CRC Press, USA
Book MATH Google Scholar
Park K I, Park M (2018) Fundamentals of probability and stochastic processes with applications to communications. Springer
Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the nips 2003 feature selection challenge. Advances in Neural Information Processing Systems, 17
Ksieniewicz P, Zyblewski P (2020) stream-learn–open-source python library for difficult data stream batch analysis. arXiv:2001.11077
Zyblewski P, Sabourin R, Woźniak M (2021) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inform Fus 66:138–154
Article Google Scholar
Hinton G E (1990) Connectionist learning procedures. 555– 610
Chan T F, Golub G H, LeVeque R J (1982) Updating formulae and a pairwise algorithm for computing sample variances. In: COMPSTAT 1982 5th symposium held at Toulouse 1982. Springer, pp 30–41
Domingos P, Hulten G (2003) A general framework for mining massive data streams. J Comput Graph Stat 12(4):945–949
Article MathSciNet Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Ksieniewicz P, Woźniak M, Cyganek B, Kasprzak A, Walkowiak K (2019) Data stream classification using active learned neural networks. Neurocomputing 353:74–82
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325 as well by the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Science and Technology, Wybrzeże Stanisława Wyspiańskiego 27, Wroclaw, 50-370, Poland
Pawel Ksieniewicz

Authors

Pawel Ksieniewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawel Ksieniewicz.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ksieniewicz, P. Processing data stream with chunk-similarity model selection. Appl Intell 53, 7931–7956 (2023). https://doi.org/10.1007/s10489-022-03826-4

Download citation

Accepted: 29 May 2022
Published: 28 July 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03826-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Processing data stream with chunk-similarity model selection

Abstract

Access this article

Similar content being viewed by others

On Ensemble Components Selection in Data Streams Scenario with Gradual Concept-Drift

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

Data Streams Classification: A Selective Ensemble with Adaptive Behavior

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Processing data stream with chunk-similarity model selection

Abstract

Access this article

Similar content being viewed by others

On Ensemble Components Selection in Data Streams Scenario with Gradual Concept-Drift

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

Data Streams Classification: A Selective Ensemble with Adaptive Behavior

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation