SDDM: an interpretable statistical concept drift detection method for data streams

Micevska, Simona; Awad, Ahmed; Sakr, Sherif

doi:10.1007/s10844-020-00634-5

SDDM: an interpretable statistical concept drift detection method for data streams

Published: 05 February 2021

Volume 56, pages 459–484, (2021)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

1021 Accesses
12 Citations
Explore all metrics

Abstract

Machine learning models assume that data is drawn from a stationary distribution. However, in practice, challenges are imposed on models that need to make sense of fast-evolving data streams, where the content of data is changing and evolving over time. This change between the distributions of training data seen so-far and the distribution of newly coming data is called concept drift. It is of utmost importance to detect concept drifts to maintain the accuracy and reliability of online classifiers. Reactive drift detectors monitor the performance of the underlying machine learning model. That is, to detect a drift, feedback on the classifier output has to be given to the drift detector, known as prequential evaluation. In many real-life scenarios, immediate feedback on classifier output is not possible. Thus, drift detection is delayed and gets out of context. Moreover, the drift detector output is in the form of a binary answer if there is a drift or not. However, it is equally important to explain the source of drift. In this paper, we present the Statistical Drift Detection Method (SDDM) which can detect drifts by monitoring the change of data distribution without the need for feedback on classifier output. Moreover, the detection is quantified and the source of drift is identified. We empirically evaluate our method against the state-of-the-art on both synthetic and real life data sets. SDDM outperforms other related approaches by producing a smaller number of false positives and false negatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tracking Drift Severity in Data Streams

Fast Hoeffding Drift Detection Method for Evolving Data Streams

SABeDM: a sliding adaptive beta distribution model for concept drift detection in a dynamic environment

Article 20 November 2023

Notes

Noise refers to random changes in features’ values that are not caused by a concept drift.

References

Baena-Garcıa, M, & et al. (2006). Early drift detection method. Fourth international workshop on knowledge discovery from data streams, 6.
Barros, R.S., Cabral, D.R., Santos, S.G., & et al. (2017). RDDM: reactive drift detection method. Expert Systems with Applications.
Bifet, A., & Gavalda, R. (2007). Learning from time-changing data with adaptive windowing. SIAM Society for Industrial and Applied Mathematics.
Bifet, A., & et al. (2009). New ensemble methods for evolving data streams. SIGKDD ACM.
Bifet, A., & et al. (2010). Moa: massive online analysis. Journal of Machine Learning Research, 11, 1601–1604.
Google Scholar
Doshi-Velez, F, & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv:1702.08608.
Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern classification, (p. 680). New York: Wiley.
MATH Google Scholar
Frías-Blanco, I., & et al. (2015). Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE TKDE, 27.3, 810–823.
Google Scholar
Gama, J, & et al. (2004). Learning with drift detection. Brazilian symposium on artificial intelligence. Springer.
Gama, J, & et al. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46.4, 44.
MATH Google Scholar
Hoens, T.R., Chawla, V, & Polikar, R. (2011). Heuristic updatable weighted random subspaces for non-stationary environments. ICDM. IEEE.
Huang, D.T.J., & et al. (2015). Drift detection using stream volatility. ECML PKDD. Springer.
Kubat, M., & Widmer, G. (1995). Adapting to drift in continuous domains. ECML Springer.
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22.1, 79–86.
Article MathSciNet Google Scholar
Kuncheva, L.I. (2004). Classifier ensembles for changing environments. International Workshop on Multiple Classifier Systems. Springer.
Levin, D.A., & Peres, Y. (2017). Markov chains and mixing times (Vol. 107). American Mathematical Soc.
Manning, C., Raghavan, P., & Schütze, H. (2010). Introduction to information retrieval. Natural Language Engineering, 16.1, 100–103.
MATH Google Scholar
Mishihara, R., Moritz, P., Wang, S., Tumanov, A., Paul, W., Schleier-Smith, J., Liaw, R., Niknami, M., Jordan, M.I., & Stoica, I. (2017). Real-time machine learning: the missing pieces. HotOS, 106–110.
Olorunnimbe, M.K., Viktor, H.L., & Paquet, E. (2018). Dynamic adaptation of online ensembles for drifting data streams. Journal of Intelligent Information Systems, 50.2, 291–313.
Article Google Scholar
Page, E.S. (1954). Continuous inspection schemes. Biometrika, 41 (1/2), 100–115.
Article MathSciNet Google Scholar
Pesaranghader, A, & Viktor, H.L. (2016). Fast hoeffding drift detection method for evolving data streams. ECML PKDD. Springer.
Pesaranghader, A, Viktor, H.L., & Paquet, E. (2018). McDiarmid drift detection methods for evolving data streams. IJCNN. IEEE.
Roarty, M. (1998). Electricity industry restructuring: the state of play. Research Paper 14, Science, Technology, Environment and Resources Group.
Ross, G.J., & et al. (2012). Exponentially weighted moving average charts for detecting concept drift. Pattern Recognition Letters, 33.2, 191–198.
Article Google Scholar
Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset Shift in Machine Learning, 3–28.
Wald, A. (1947). Sequential analysis. Wiley.
Webb, G.I., & et al. (2016). Characterizing concept drift. Data Mining and Knowledge Discovery, 30.4, 964–994.
Article MathSciNet Google Scholar
Webb, G.I., & et al. (2017). Understanding concept drift. arXiv:1704.00362.
žliobaitë, I, Budka, M., & Stahl, F. (2015). Towards cost-sensitive adaptation: when is it worth updating your predictive model?. Neurocomputing, 150, 240–249.
Article Google Scholar

Download references

Funding

The work by Ahmed Awad and Sherif Sakr is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75).

Author information

Authors and Affiliations

University of Tartu, Tartu, Estonia
Simona Micevska, Ahmed Awad & Sherif Sakr
Nile University, Giza, Egypt
Ahmed Awad

Authors

Simona Micevska
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Awad
View author publications
You can also search for this author in PubMed Google Scholar
Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Awad.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Micevska, S., Awad, A. & Sakr, S. SDDM: an interpretable statistical concept drift detection method for data streams. J Intell Inf Syst 56, 459–484 (2021). https://doi.org/10.1007/s10844-020-00634-5

Download citation

Received: 06 October 2020
Revised: 21 December 2020
Accepted: 22 December 2020
Published: 05 February 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10844-020-00634-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SDDM: an interpretable statistical concept drift detection method for data streams

Abstract

Access this article

Similar content being viewed by others

Tracking Drift Severity in Data Streams

Fast Hoeffding Drift Detection Method for Evolving Data Streams

SABeDM: a sliding adaptive beta distribution model for concept drift detection in a dynamic environment

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SDDM: an interpretable statistical concept drift detection method for data streams

Abstract

Access this article

Similar content being viewed by others

Tracking Drift Severity in Data Streams

Fast Hoeffding Drift Detection Method for Evolving Data Streams

SABeDM: a sliding adaptive beta distribution model for concept drift detection in a dynamic environment

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation