Abstract
Web page classification has a crucial role in web mining. The massive amount of data available on the web makes it so important to build web page prediction models. We aim to build classification models that classify new instances depending on existing labeled web documents. This paper investigates the effect of the two powerful ensemble methods called stacked generalization-also known as stacking- and random forest in web page classification context. In this paper, we suggest to enhance the predictive power of the web page classification models by stacking ensemble method. Random forest, stacking with multi-response model trees and four different base learners (Naïve Bayes, J4.8, IBK and FURIA) are used. Datasets are obtained from DMOZ (Open Directory Project). This paper provides an empirical study on the existing supervised classifiers and ensemble learning methods in web page classification context. It introduces that constructing ensembles of heterogeneous classifiers with stacking has higher predictive power than the individual classifiers, boosting and random forest for web page classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 891–920. Springer, Heidelberg (2005)
Kosala, R., Blockeel, H.: Web mining research: a survey. SIGKDD Expl. 2, 1–15 (2000)
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 1–31 (2009). Article 12
Dietterich, T.: Machine learning research: four current directions. AI Mag. 18(4), 97–136 (1997)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Freud, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Ito, T. (ed.) TPPP 1994. LNCS, vol. 907, pp. 23–37. Springer, Heidelberg (1995)
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Dzeroski, S., Zenko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54, 255–273 (2004)
Breiman, L.: Random forests. Mach. Learn. J. 45, 5–32 (2001)
Onan, A.: Cassifier and feature set ensembles for web page classification. J. Inf. Sci. 42(2), 150–165 (2015)
Onan, A.: Artificial immune system based web page classification. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Software Engineering in Intelligent Systems, pp. 189–199. Springer, Berlin (2015)
Cobos, C., Munoz-Collazos, H., Urbano-Munoz, R., Mendoza, M., Leon, E., Herrera-Viedma, E.: Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Inf. Sci. 281, 248–264 (2014)
Sun, A., Lim, EP., Ng, WK.: Web classification using support vector machine. In: Proceedings of the 4th International Workshop on Web Information and Data Management, pp. 96–99. ACM, New York (2002)
Haruechaiyasak, C., Shyu, M.L., Chen, S.C., Li, X.: Web document classification based on fuzzy association. In: Proceedings of COMPSAC 2002, pp. 487–492. IEEE, New York (2002)
Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201–211. Springer, Heidelberg (2002)
Marath, S.T., Shepherd, M., Milios, E., Duffy, J: Large-scale web page classification. In 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 1813–1822 (2014)
Ratanamahatana, C., Gunopulos, D.: Feature selection for the naive Bayesian classifier using decision trees. Appl. Artif. Intell. 17(5), 475–487 (2003)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithm. Mach. Learn. 6, 37–66 (1999)
Huhn, J., Hullermeier, E.: FURIA: an algorithm for unordered fuzzy rule induction. Data Mining Knowl. Disc. 19(3), 293–319 (2009)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Wolpert, D.H.: Stacked generalization. Neural Networks. 5, 241–259 (1992)
Seewald, A.K.: How to make stacking better and faster while also taking care of an unknown weakness. In: Nineteenth International Conference on Machine Learning, pp. 554–561 (2002)
Merz, C.J.: Using correspondence analysis to combine classifiers. Mach. Learn. 36, 33–58 (1999)
Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. 10, 271–289 (1999)
Todorovski, L., Džeroski, S.: Combining multiple models with meta decision trees. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 54–64. Springer, Heidelberg (2000)
Seewald, A.K., Fürnkranz, J.: An Evaluation of Grading classifiers. In: Hoffmann, F., Adams, N., Fisher, D., Guimaraes, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 115–124. Springer, Heidelberg (2001)
Nagi, S., Bhattacharyya, D.K.: Classification of microarray cancer data using ensemble approach. Netw Model Anal Health 2(3), 59–173 (2013)
Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H.: Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998)
DMOZ Open Directory Project Dataset. http://www.unicauca.edu.co/~ccobos/wdc/wdc.htm
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Elsalmy, F., Ismail, R., AbdelMoez, W. (2017). Enhancing Web Page Classification Models. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_71
Download citation
DOI: https://doi.org/10.1007/978-3-319-48308-5_71
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)