Skip to main content
Log in

SDDM: an interpretable statistical concept drift detection method for data streams

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Machine learning models assume that data is drawn from a stationary distribution. However, in practice, challenges are imposed on models that need to make sense of fast-evolving data streams, where the content of data is changing and evolving over time. This change between the distributions of training data seen so-far and the distribution of newly coming data is called concept drift. It is of utmost importance to detect concept drifts to maintain the accuracy and reliability of online classifiers. Reactive drift detectors monitor the performance of the underlying machine learning model. That is, to detect a drift, feedback on the classifier output has to be given to the drift detector, known as prequential evaluation. In many real-life scenarios, immediate feedback on classifier output is not possible. Thus, drift detection is delayed and gets out of context. Moreover, the drift detector output is in the form of a binary answer if there is a drift or not. However, it is equally important to explain the source of drift. In this paper, we present the Statistical Drift Detection Method (SDDM) which can detect drifts by monitoring the change of data distribution without the need for feedback on classifier output. Moreover, the detection is quantified and the source of drift is identified. We empirically evaluate our method against the state-of-the-art on both synthetic and real life data sets. SDDM outperforms other related approaches by producing a smaller number of false positives and false negatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Noise refers to random changes in features’ values that are not caused by a concept drift.

References

  • Baena-Garcıa, M, & et al. (2006). Early drift detection method. Fourth international workshop on knowledge discovery from data streams, 6.

  • Barros, R.S., Cabral, D.R., Santos, S.G., & et al. (2017). RDDM: reactive drift detection method. Expert Systems with Applications.

  • Bifet, A., & Gavalda, R. (2007). Learning from time-changing data with adaptive windowing. SIAM Society for Industrial and Applied Mathematics.

  • Bifet, A., & et al. (2009). New ensemble methods for evolving data streams. SIGKDD ACM.

  • Bifet, A., & et al. (2010). Moa: massive online analysis. Journal of Machine Learning Research, 11, 1601–1604.

    Google Scholar 

  • Doshi-Velez, F, & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv:1702.08608.

  • Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern classification, (p. 680). New York: Wiley.

    MATH  Google Scholar 

  • Frías-Blanco, I., & et al. (2015). Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE TKDE, 27.3, 810–823.

    Google Scholar 

  • Gama, J, & et al. (2004). Learning with drift detection. Brazilian symposium on artificial intelligence. Springer.

  • Gama, J, & et al. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46.4, 44.

    MATH  Google Scholar 

  • Hoens, T.R., Chawla, V, & Polikar, R. (2011). Heuristic updatable weighted random subspaces for non-stationary environments. ICDM. IEEE.

  • Huang, D.T.J., & et al. (2015). Drift detection using stream volatility. ECML PKDD. Springer.

  • Kubat, M., & Widmer, G. (1995). Adapting to drift in continuous domains. ECML Springer.

  • Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22.1, 79–86.

    Article  MathSciNet  Google Scholar 

  • Kuncheva, L.I. (2004). Classifier ensembles for changing environments. International Workshop on Multiple Classifier Systems. Springer.

  • Levin, D.A., & Peres, Y. (2017). Markov chains and mixing times (Vol. 107). American Mathematical Soc.

  • Manning, C., Raghavan, P., & Schütze, H. (2010). Introduction to information retrieval. Natural Language Engineering, 16.1, 100–103.

    MATH  Google Scholar 

  • Mishihara, R., Moritz, P., Wang, S., Tumanov, A., Paul, W., Schleier-Smith, J., Liaw, R., Niknami, M., Jordan, M.I., & Stoica, I. (2017). Real-time machine learning: the missing pieces. HotOS, 106–110.

  • Olorunnimbe, M.K., Viktor, H.L., & Paquet, E. (2018). Dynamic adaptation of online ensembles for drifting data streams. Journal of Intelligent Information Systems, 50.2, 291–313.

    Article  Google Scholar 

  • Page, E.S. (1954). Continuous inspection schemes. Biometrika, 41 (1/2), 100–115.

    Article  MathSciNet  Google Scholar 

  • Pesaranghader, A, & Viktor, H.L. (2016). Fast hoeffding drift detection method for evolving data streams. ECML PKDD. Springer.

  • Pesaranghader, A, Viktor, H.L., & Paquet, E. (2018). McDiarmid drift detection methods for evolving data streams. IJCNN. IEEE.

  • Roarty, M. (1998). Electricity industry restructuring: the state of play. Research Paper 14, Science, Technology, Environment and Resources Group.

  • Ross, G.J., & et al. (2012). Exponentially weighted moving average charts for detecting concept drift. Pattern Recognition Letters, 33.2, 191–198.

    Article  Google Scholar 

  • Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset Shift in Machine Learning, 3–28.

  • Wald, A. (1947). Sequential analysis. Wiley.

  • Webb, G.I., & et al. (2016). Characterizing concept drift. Data Mining and Knowledge Discovery, 30.4, 964–994.

    Article  MathSciNet  Google Scholar 

  • Webb, G.I., & et al. (2017). Understanding concept drift. arXiv:1704.00362.

  • žliobaitë, I, Budka, M., & Stahl, F. (2015). Towards cost-sensitive adaptation: when is it worth updating your predictive model?. Neurocomputing, 150, 240–249.

    Article  Google Scholar 

Download references

Funding

The work by Ahmed Awad and Sherif Sakr is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Awad.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Micevska, S., Awad, A. & Sakr, S. SDDM: an interpretable statistical concept drift detection method for data streams. J Intell Inf Syst 56, 459–484 (2021). https://doi.org/10.1007/s10844-020-00634-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00634-5

Keywords

Navigation