Enhancing Web Page Classification Models

Elsalmy, Fayrouz; Ismail, Rasha; AbdelMoez, Walid

doi:10.1007/978-3-319-48308-5_71

Fayrouz Elsalmy⁷,
Rasha Ismail⁸ &
Walid AbdelMoez⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Included in the following conference series:

International Conference on Advanced Intelligent Systems and Informatics

2548 Accesses
1 Citations

Abstract

Web page classification has a crucial role in web mining. The massive amount of data available on the web makes it so important to build web page prediction models. We aim to build classification models that classify new instances depending on existing labeled web documents. This paper investigates the effect of the two powerful ensemble methods called stacked generalization-also known as stacking- and random forest in web page classification context. In this paper, we suggest to enhance the predictive power of the web page classification models by stacking ensemble method. Random forest, stacking with multi-response model trees and four different base learners (Naïve Bayes, J4.8, IBK and FURIA) are used. Datasets are obtained from DMOZ (Open Directory Project). This paper provides an empirical study on the existing supervised classifiers and ensemble learning methods in web page classification context. It introduces that constructing ensembles of heterogeneous classifiers with stacking has higher predictive power than the individual classifiers, boosting and random forest for web page classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 891–920. Springer, Heidelberg (2005)
Google Scholar
Kosala, R., Blockeel, H.: Web mining research: a survey. SIGKDD Expl. 2, 1–15 (2000)
Article Google Scholar
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 1–31 (2009). Article 12
Article Google Scholar
Dietterich, T.: Machine learning research: four current directions. AI Mag. 18(4), 97–136 (1997)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Freud, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Ito, T. (ed.) TPPP 1994. LNCS, vol. 907, pp. 23–37. Springer, Heidelberg (1995)
Google Scholar
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Article Google Scholar
Dzeroski, S., Zenko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54, 255–273 (2004)
Article MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. J. 45, 5–32 (2001)
Article MATH Google Scholar
Onan, A.: Cassifier and feature set ensembles for web page classification. J. Inf. Sci. 42(2), 150–165 (2015)
Article Google Scholar
Onan, A.: Artificial immune system based web page classification. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Software Engineering in Intelligent Systems, pp. 189–199. Springer, Berlin (2015)
Google Scholar
Cobos, C., Munoz-Collazos, H., Urbano-Munoz, R., Mendoza, M., Leon, E., Herrera-Viedma, E.: Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Inf. Sci. 281, 248–264 (2014)
Article Google Scholar
Sun, A., Lim, EP., Ng, WK.: Web classification using support vector machine. In: Proceedings of the 4th International Workshop on Web Information and Data Management, pp. 96–99. ACM, New York (2002)
Google Scholar
Haruechaiyasak, C., Shyu, M.L., Chen, S.C., Li, X.: Web document classification based on fuzzy association. In: Proceedings of COMPSAC 2002, pp. 487–492. IEEE, New York (2002)
Google Scholar
Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201–211. Springer, Heidelberg (2002)
Chapter Google Scholar
Marath, S.T., Shepherd, M., Milios, E., Duffy, J: Large-scale web page classification. In 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 1813–1822 (2014)
Google Scholar
Ratanamahatana, C., Gunopulos, D.: Feature selection for the naive Bayesian classifier using decision trees. Appl. Artif. Intell. 17(5), 475–487 (2003)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithm. Mach. Learn. 6, 37–66 (1999)
Google Scholar
Huhn, J., Hullermeier, E.: FURIA: an algorithm for unordered fuzzy rule induction. Data Mining Knowl. Disc. 19(3), 293–319 (2009)
Article MathSciNet Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Networks. 5, 241–259 (1992)
Article Google Scholar
Seewald, A.K.: How to make stacking better and faster while also taking care of an unknown weakness. In: Nineteenth International Conference on Machine Learning, pp. 554–561 (2002)
Google Scholar
Merz, C.J.: Using correspondence analysis to combine classifiers. Mach. Learn. 36, 33–58 (1999)
Article Google Scholar
Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. 10, 271–289 (1999)
MATH Google Scholar
Todorovski, L., Džeroski, S.: Combining multiple models with meta decision trees. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 54–64. Springer, Heidelberg (2000)
Chapter Google Scholar
Seewald, A.K., Fürnkranz, J.: An Evaluation of Grading classifiers. In: Hoffmann, F., Adams, N., Fisher, D., Guimaraes, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 115–124. Springer, Heidelberg (2001)
Chapter Google Scholar
Nagi, S., Bhattacharyya, D.K.: Classification of microarray cancer data using ensemble approach. Netw Model Anal Health 2(3), 59–173 (2013)
Google Scholar
Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H.: Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998)
Article MATH Google Scholar
DMOZ Open Directory Project Dataset. http://www.unicauca.edu.co/~ccobos/wdc/wdc.htm
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing and Information Technology, Arab Academy for Science, Technology and Maritime Transport, PO Box 1029, Alexandria, Egypt
Fayrouz Elsalmy & Walid AbdelMoez
Ain Shams University, Cairo, Egypt
Rasha Ismail

Authors

Fayrouz Elsalmy
View author publications
You can also search for this author in PubMed Google Scholar
Rasha Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Walid AbdelMoez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fayrouz Elsalmy .

Editor information

Editors and Affiliations

Faculty of Computers & Information, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Dubai International Academic City, The British University, Dubai, United Arab Emirates
Khaled Shaalan
CS Dept. Faculty of Computers and Inform, Suez Canal University CS Dept. Faculty of Computers and Inform, Ismailia, Egypt
Tarek Gaber
Ahmed Orabi Square , Menouf, Egypt
Ahmad Taher Azar
Faculty of Computer & Information Scienc, Ain Shams University Faculty of Computer & Information Scienc, Cairo, Egypt
M. F. Tolba

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elsalmy, F., Ismail, R., AbdelMoez, W. (2017). Enhancing Web Page Classification Models. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_71

Download citation

DOI: https://doi.org/10.1007/978-3-319-48308-5_71
Published: 18 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics