Abstract
Customizability, extensive community support and ease of availability have led to the popularity of Open-Source Software (OSS) systems. However, maintenance of these systems is a challenge especially as they become considerably large and complex with time. One possible method of ensuring effective quality in large scale OSS is the adoption of software change prediction models. These models aid in identifying change-prone parts in the early stages of software development, which can then be effectively managed by software practitioners. This study extensively evaluates eight Homogeneous Ensemble Learners (HEL) for developing software change prediction models on five large scale OSS datasets. HEL, which integrate the outputs of several learners of the same type are known to generate improved results than other non-ensemble classifiers. The study also statistically compares the results of the models developed by HEL with ten non-ensemble classifiers. We further assess the change in performance of HEL for developing software change prediction models by substituting their default base learners with other classifiers. The results of the study support the use of HEL for developing software change prediction models and indicate Random Forest as the best HEL for the purpose.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Malhotra, R., Khanna, M.: Software change prediction: a systematic review and future guidelines. eInformatica Softw. Eng. J. 13(1), 227–259 (2019)
Malhotra, R., Khanna, M.: An empirical study for software change prediction using imbalanced data. Empirical Softw. Eng. 22(6), 2806–2851 (2017)
Zhou, Y., Leung, H., Xu, B.: Examining the potentially confounding effect of class size on the associations between object metrics and change-proneness. IEEE Trans. Softw. Eng. 35(5), 607–623 (2009)
Catolino, G., Ferrucci, F.: Ensemble techniques for software change prediction: a preliminary investigation. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 25–30. IEEE (2018)
Zhu, X., He, Y., Cheng, L., Jia, X., Zhu, L.: Software change-proneness prediction through combination of bagging and resampling methods. J. Softw. Evol. Process 30(12), 1–17 (2018)
Rathore, S.S., Kumar, S.: An empirical study of ensemble techniques for software fault prediction. Appl. Intell. 1–30 (2020)
Aljamaan, H., Alazba, A.: Software defect prediction using tree-based ensembles. In: 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 1–10. ACM (2020)
Yucular, F., Ozcift, A., Boranbag, E., Kilinc, D.: Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability. Eng. Sci. Technol. Int. J. 23(4), 938–950 (2020)
Kaur, A., Kaur, K.: Performance analysis of ensemble learning for predicting defects in open source software. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 219–225. IEEE (2014)
Malhotra, R., Bansal, A.: Investigation of various data analysis techniques to identify change-prone parts of an open source software. Int. J. Syst. Assurance Eng. Manage. 9(2), 401–426 (2017)
Elish, M.O., Aljamaan, H., Ahmad, I.: Three empirical studies on predicting software maintainability using ensemble methods. Soft. Comput. 19(9), 2511–2524 (2015)
Kumar, L., Lal, S., Goyal, A., Murthy, N.L.: Change-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques. In: Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), pp. 1–11. ACM (2019)
Chidamber, S., Kemerer, C.: A metric suite for object-oriented design. IEEE Trans. Softw. Eng. 20, 476–493 (1994)
Lorenz, M., Kidd, J.: Object-oriented Software Metrics: A Practical Guide. Prentice-Hall, Inc. (1994)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Webb, G.I.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)
Brieman, L.: Random forests. Mach. Learn. 45(1), 5– 32 (2001)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1-score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 1–13 (2020)
Shatnawi, R.: Improving software fault-prediction for imbalanced data. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 54–59. IEEE, UAE (2012)
Baskin, I.I., Marcou, G., Horvath, D., Varnek, A.: Random subspaces and random forest. Tutorials Chemoinform. 263–269 (2017)
Bustamam, A., Musti, M.I.S., Hartomo, S., Aprilia, S., Tampubolon, P.P., Lestari, D.: Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences. BMC Genom. 20(9), 950–963 (2019)
Alpaydin, E.: Introduction to Machine Learning, 3rd edn. MIT Press, Cambridge (2014)
Malhotra, R., Khanna, M.: An explanatory study for software change prediction in object-oriented systems using hybridized techniques. Autom. Softw. Eng. 24(3), 673–717 (2017)
Sohail, M.N., Jiadong, R., Uba, M.M., Irshad, M., Iqbal, W., Arshad, J., John, A.V.: A hybrid forecast cost benefit classification of diabetes mellitus prevalence based on epidemiological study on real-life patient’s data. Sci. Rep. 9(1), 1–10 (2019)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Khanna, M., Priya, S., Mehra, D. (2021). Software Change Prediction with Homogeneous Ensemble Learners on Large Scale Open-Source Systems. In: Taibi, D., Lenarduzzi, V., Kilamo, T., Zacchiroli, S. (eds) Open Source Systems. OSS 2021. IFIP Advances in Information and Communication Technology, vol 624. Springer, Cham. https://doi.org/10.1007/978-3-030-75251-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-75251-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75250-7
Online ISBN: 978-3-030-75251-4
eBook Packages: Computer ScienceComputer Science (R0)