Skip to main content
Log in

Revisiting the effect of history on learning performance: the problem of the demanding lord

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In a variety of settings ranging from recommendation systems to information filtering, approaches which take into account feedback have been introduced to improve services and user experience. However, as also indicated in the machine learning literature, there exist several settings where the requirements and target concept of a learning system changes over time, which consists a case of “concept drift”. In several systems, a sliding window over the training instances has been used to follow drifting concepts. However, no general analytic study has been performed on the relation between the size of the sliding window and the average performance of a learning system, since previous works have focused on instantaneous performance and specific underlying learners and data characteristics. This work proposes an analytic model that describes the effect of memory window size on the prediction performance of a learning system that is based on iterative feedback. The analysis considers target concepts changing over time, either periodically or randomly, using a formulation termed “the problem of the demanding lord”. Using a signal-to-noise approach to sketch learning ability of underlying machine learning algorithms, we estimate the average performance of a learning system regardless of its underlying algorithm and, as a corollary, propose a stepping stone toward finding the memory window that maximizes the average performance for a given drift setting and a specific modeling system. We experimentally support the proposed methodology with very promising results on three synthetic and four real datasets, using a variety of learning algorithms including Support Vector Machines, Naive Bayes, Nearest Neighbor and Decision Trees on classification and regression tasks. The results validate the analysis and indicate very good estimation performance in different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. A preliminary version of this work can be found in Giannakopoulos and Palpanas [19].

  2. We also applied nonlinear regression alternatives for the search, but the methods failed to converge in some cases and, thus, we were forced to try genetic algorithms.

  3. The results of the SEA dataset are similar to those of the STAGGER dataset.

References

  1. Ahmed A, Low, Y, Aly M, Josifovski V, Smola A (2011) Scalable distributed inference of dynamic user interests for behavioral targeting. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 114–122

  2. Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342

    Google Scholar 

  3. Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà, R, Morales-Bueno R (2006) Early drift detection method. In: ECML PKDD 2006 workshop on knowledge discovery from data streams

  4. Bouquet P, Stoermer H, Bazzanella B (2008). An entity name system (ENS) for the semantic web. In: European semantic web conference, pp 258–272

  5. Cleveland WS (1981) LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am Stat 35:54

    Article  Google Scholar 

  6. Crabtree BI, Soltysiak S (1998) Identifying and tracking changing interests. Int J Digit Libr 2(1):38–53

    Article  Google Scholar 

  7. Dashnyam B, Liu Y-C, Hsu P-Y, Tsai Y-T (2011) Real time prediction of closing price and duration of b2b reverse auctions. Knowl Inf Syst. http://www.springerlink.com/content/p302748t360w4047/

  8. Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923

    Article  Google Scholar 

  9. Domingos P, Hulten G, (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80

  10. Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  11. Fdez-Riverola F, Iglesias E, Díaz F, Méndez J, Corchado J (2007) Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 33(1):36–48

    Article  Google Scholar 

  12. Fidalgo-Merino R, Nunez M (2011) Self-adaptive induction of regression trees. IEEE Trans Pattern Anal Mach Intell 33(8):1659–1672

    Article  Google Scholar 

  13. Freund Y, Schapire R (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296

    Article  MATH  Google Scholar 

  14. Fukunaga K, Hayes R (1989a) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 11(8):873–885

    Article  Google Scholar 

  15. Fukunaga K, Hayes R (1989b) Estimation of classifier performance. IEEE Trans Pattern Anal Mach Intell 11(10):1087–1101

    Article  Google Scholar 

  16. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Lecture notes in computer science, pp 286–295

  17. Giannakopoulos G, Palpanas T (2009) Adaptivity in entity subscription services. In: Proceedings of ADAPTIVE2009, Athens, Greece

  18. Giannakopoulos G, Palpanas T (2010a) Content and type as orthogonal modeling features: a study on user interest awareness in entity subscription services. Int J Adv Netw Serv 3(2). http://www.iit.demokritos.gr/ggianna/Publications/adaptiveSubscription.pdf

  19. Giannakopoulos G, Palpanas T (2010b) The effect of history on modeling systems’ performance. In: The problem of the demanding lord, ICDM 2010, IEEE. http://www.iit.demokritos.gr/ggianna/Publications/ICDM2010.pdf

  20. Goldberg D (1989) Genetic algorithms in search and optimization. Addison-Wesley, Boston

    MATH  Google Scholar 

  21. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  22. Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, University of South Wales

  23. Harter H, Khamis H, Lamb R (1984) Modified Kolmogorov-Smirnov tests of goodness of fit. Commun Stat Simul Comput 13(3):293–323

    Article  Google Scholar 

  24. Helmbold DP, Long PM, (1994) Tracking drifting concepts by minimizing disagreements. Mach Learn 14(1):27–45. http://dx.doi.org/10.1007/BF00993161

    Google Scholar 

  25. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 97–106

  26. Ifo Institute (2010) Ifo business survey—the ifo business climate for Germany since January 1991. http://www.cesifo-group.de/portal/page/portal/ifoHome/a-winfo/d6zeitreihen/15reihen/_reihenkt

  27. Ikonomovska E, Gama J, Deroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168

    Article  MathSciNet  MATH  Google Scholar 

  28. Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. Discov Sci 5808:121–135

    Google Scholar 

  29. Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: ICML ’00 Proceedings of the seventeenth international conference on machine learning, pp 431–438

  30. Kiefer J (1953) Sequential minimax search for a maximum. Proc Am Math Soc 4(3):502–506

    Article  MathSciNet  MATH  Google Scholar 

  31. Klinkenberg R, Renz I, (1998). Adaptive information filtering: learning in the presence of concept drifts. In: Learning for text categorization, pp 33–40

  32. Kohavi R (1995) The power of decision tables. Mach Learn ECML-95 912:174–189

    Article  Google Scholar 

  33. Koychev I, Lothian R, (2005) Tracking drifting concepts by time window optimisation. In: Proceedings of AI-2005, the twenty-fifth SGAI international conference on innovative techniques and applications of artificial intelligence. Citeseer, pp 46–59

  34. Koychev I, Schwab I (2000) Adaptation to drifting user’s interests. In: Proceedings of ECML2000 workshop: machine learning in new unformation age Citeseer, pp 39–45

  35. Kuh A, Petsche T, Rivest R (1990) Learning time-varying concepts. In: Proceedings of the 1990 conference on advances in neural information processing systems 3, Morgan Kaufmann, Burlington, p 189

  36. Kuncheva L, Žliobait\({\dot{\text{ e}}}\) I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872

    Google Scholar 

  37. Lam W, Mukhopadhyay S, Mostafa J, Palakal M, (1996) Detection of shifts in user interests for personalized information filtering. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in, information retrieval, p 325

  38. Lazarescu M, Venkatesh S, Bui H (2004) Using multiple windows to track concept drift. Intell Data Anal 8(1):29–59

    Google Scholar 

  39. Liu W, Wang T (2011) Online active multi-field learning for efficient email spam filtering. Knowl Inf Syst. http://www.springerlink.com/content/173321g2qg2u0q61/

  40. Maloof M, Michalski R (1999) Selecting examples for partial memory learning. Mach Learn 41(1):27–52

    Article  Google Scholar 

  41. Maloof M, Michalski R (2004) Incremental learning with partial instance memory. Artif Intell 154(1–2): 95–126

    Article  MathSciNet  MATH  Google Scholar 

  42. Massey F Jr (1951) The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Article  MATH  Google Scholar 

  43. Mitchell T, Caruana R, Freitag D, McDermott J, Zabowski D (1994) Experience with a learning personal assistant. Commun ACM 37(7):80–91

    Article  Google Scholar 

  44. Musavi M, Chan K, Hummels D, Kalantri K (1994) On the generalization ability of neural network classifiers. IEEE Trans Pattern Anal Mach Intell 16(6):659–663

    Article  Google Scholar 

  45. Núñez M, Fidalgo R, Morales R (2007) Learning in environments with unknown dynamics: towards more robust concept learners. J Mach Learn Res 8:2595–2628

    MathSciNet  MATH  Google Scholar 

  46. Palpanas T, Chaudhry JA, Andritsos P, Velegrakis Y (2008) Entity data management in OKKAM DEXA workshops, pp 729–733

  47. Patist J (2007) Optimal window change detection. In: Data mining workshops, 2007. ICDM workshops 2007, IEEE, pp 557–562

  48. Ramon J, Driessens K, Croonenborghs T (2007) Transfer learning in reinforcement learning problems through partial policy recycling. Mach Learn ECML 2007 4701:699–707

    Article  Google Scholar 

  49. Reinke R, Michalski R (1988) Incremental learning of concept descriptions: a method and experimental results. Mach Intell 11:263–288

    Google Scholar 

  50. Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

    Google Scholar 

  51. Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal 11(1):3–28

    Google Scholar 

  52. Stanley K (2001) Learning concept drift with a committee of decision trees, Technical report. Computer Science Department, University of Texas, Austin

  53. Street WN, Kim YS (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: 7th ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382

  54. Tsymbal A (2004) The problem of concept drift: definitions and related work, technical report

  55. Vapnik V (1999) The nature of statistical learning theory. Springer, New York

  56. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: 9th ACM SIGKDD international conference on knowledge discovery and data mining. KDD 03, ACM, pp 226–235. ACM ID: 956778

  57. Wang Y, Witten I (1996) Induction of model trees for predicting continuous classes. In: Poster articles of the 9th European conference on machine learning

  58. Webb GI, Pazzani MJ, Billsus D (2001) Machine learning for user modeling. User Model User-Adapt Interact 11(1):19–29

    Article  MATH  Google Scholar 

  59. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  60. Widyantoro D, Ioerger T, Yen J (1999) An adaptive algorithm for learning changes in user interests. In: Proceedings of the eighth international conference on information and knowledge management. ACM New York, pp 405–412

  61. Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of ICDM 2011

  62. Zliobaite I (2010) Change with delayed labeling: when is it detectable? In: Data mining workshops (ICDMW), 2010 IEEE international conference on IEEE, pp 843–850

Download references

Acknowledgments

The authors would also like to thank Prof. Alberto Valli of the University of Trento for useful advice and insight on the estimation of the sigmoid function. This work was partially supported by the FP7 EU Large-scale Integrating Project OKKAM “Enabling a Web of Entities” (contract no. ICT-215032). See http://www.okkam.org.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Giannakopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Giannakopoulos, G., Palpanas, T. Revisiting the effect of history on learning performance: the problem of the demanding lord. Knowl Inf Syst 36, 653–691 (2013). https://doi.org/10.1007/s10115-012-0568-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0568-8

Keywords

Navigation