Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Abstract

Web page classification has a crucial role in web mining. The massive amount of data available on the web makes it so important to build web page prediction models. We aim to build classification models that classify new instances depending on existing labeled web documents. This paper investigates the effect of the two powerful ensemble methods called stacked generalization-also known as stacking- and random forest in web page classification context. In this paper, we suggest to enhance the predictive power of the web page classification models by stacking ensemble method. Random forest, stacking with multi-response model trees and four different base learners (Naïve Bayes, J4.8, IBK and FURIA) are used. Datasets are obtained from DMOZ (Open Directory Project). This paper provides an empirical study on the existing supervised classifiers and ensemble learning methods in web page classification context. It introduces that constructing ensembles of heterogeneous classifiers with stacking has higher predictive power than the individual classifiers, boosting and random forest for web page classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 891–920. Springer, Heidelberg (2005)

    Google Scholar 

  2. Kosala, R., Blockeel, H.: Web mining research: a survey. SIGKDD Expl. 2, 1–15 (2000)

    Article  Google Scholar 

  3. Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 1–31 (2009). Article 12

    Article  Google Scholar 

  4. Dietterich, T.: Machine learning research: four current directions. AI Mag. 18(4), 97–136 (1997)

    Google Scholar 

  5. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  6. Freud, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Ito, T. (ed.) TPPP 1994. LNCS, vol. 907, pp. 23–37. Springer, Heidelberg (1995)

    Google Scholar 

  7. Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)

    Article  Google Scholar 

  8. Dzeroski, S., Zenko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54, 255–273 (2004)

    Article  MATH  Google Scholar 

  9. Breiman, L.: Random forests. Mach. Learn. J. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  10. Onan, A.: Cassifier and feature set ensembles for web page classification. J. Inf. Sci. 42(2), 150–165 (2015)

    Article  Google Scholar 

  11. Onan, A.: Artificial immune system based web page classification. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Software Engineering in Intelligent Systems, pp. 189–199. Springer, Berlin (2015)

    Google Scholar 

  12. Cobos, C., Munoz-Collazos, H., Urbano-Munoz, R., Mendoza, M., Leon, E., Herrera-Viedma, E.: Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Inf. Sci. 281, 248–264 (2014)

    Article  Google Scholar 

  13. Sun, A., Lim, EP., Ng, WK.: Web classification using support vector machine. In: Proceedings of the 4th International Workshop on Web Information and Data Management, pp. 96–99. ACM, New York (2002)

    Google Scholar 

  14. Haruechaiyasak, C., Shyu, M.L., Chen, S.C., Li, X.: Web document classification based on fuzzy association. In: Proceedings of COMPSAC 2002, pp. 487–492. IEEE, New York (2002)

    Google Scholar 

  15. Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201–211. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Marath, S.T., Shepherd, M., Milios, E., Duffy, J: Large-scale web page classification. In 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 1813–1822 (2014)

    Google Scholar 

  17. Ratanamahatana, C., Gunopulos, D.: Feature selection for the naive Bayesian classifier using decision trees. Appl. Artif. Intell. 17(5), 475–487 (2003)

    Article  Google Scholar 

  18. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithm. Mach. Learn. 6, 37–66 (1999)

    Google Scholar 

  19. Huhn, J., Hullermeier, E.: FURIA: an algorithm for unordered fuzzy rule induction. Data Mining Knowl. Disc. 19(3), 293–319 (2009)

    Article  MathSciNet  Google Scholar 

  20. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  21. Wolpert, D.H.: Stacked generalization. Neural Networks. 5, 241–259 (1992)

    Article  Google Scholar 

  22. Seewald, A.K.: How to make stacking better and faster while also taking care of an unknown weakness. In: Nineteenth International Conference on Machine Learning, pp. 554–561 (2002)

    Google Scholar 

  23. Merz, C.J.: Using correspondence analysis to combine classifiers. Mach. Learn. 36, 33–58 (1999)

    Article  Google Scholar 

  24. Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. 10, 271–289 (1999)

    MATH  Google Scholar 

  25. Todorovski, L., Džeroski, S.: Combining multiple models with meta decision trees. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 54–64. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  26. Seewald, A.K., Fürnkranz, J.: An Evaluation of Grading classifiers. In: Hoffmann, F., Adams, N., Fisher, D., Guimaraes, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 115–124. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  27. Nagi, S., Bhattacharyya, D.K.: Classification of microarray cancer data using ensemble approach. Netw Model Anal Health 2(3), 59–173 (2013)

    Google Scholar 

  28. Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H.: Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998)

    Article  MATH  Google Scholar 

  29. DMOZ Open Directory Project Dataset. http://www.unicauca.edu.co/~ccobos/wdc/wdc.htm

  30. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fayrouz Elsalmy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Elsalmy, F., Ismail, R., AbdelMoez, W. (2017). Enhancing Web Page Classification Models. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48308-5_71

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48307-8

  • Online ISBN: 978-3-319-48308-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics