Software Change Prediction with Homogeneous Ensemble Learners on Large Scale Open-Source Systems

Khanna, Megha; Priya, Srishti; Mehra, Diksha

doi:10.1007/978-3-030-75251-4_7

Megha Khanna¹⁹,
Srishti Priya¹⁹ &
Diksha Mehra¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 624))

Included in the following conference series:

IFIP International Conference on Open Source Systems

221 Accesses

Abstract

Customizability, extensive community support and ease of availability have led to the popularity of Open-Source Software (OSS) systems. However, maintenance of these systems is a challenge especially as they become considerably large and complex with time. One possible method of ensuring effective quality in large scale OSS is the adoption of software change prediction models. These models aid in identifying change-prone parts in the early stages of software development, which can then be effectively managed by software practitioners. This study extensively evaluates eight Homogeneous Ensemble Learners (HEL) for developing software change prediction models on five large scale OSS datasets. HEL, which integrate the outputs of several learners of the same type are known to generate improved results than other non-ensemble classifiers. The study also statistically compares the results of the models developed by HEL with ten non-ensemble classifiers. We further assess the change in performance of HEL for developing software change prediction models by substituting their default base learners with other classifiers. The results of the study support the use of HEL for developing software change prediction models and indicate Random Forest as the best HEL for the purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Malhotra, R., Khanna, M.: Software change prediction: a systematic review and future guidelines. eInformatica Softw. Eng. J. 13(1), 227–259 (2019)
Google Scholar
Malhotra, R., Khanna, M.: An empirical study for software change prediction using imbalanced data. Empirical Softw. Eng. 22(6), 2806–2851 (2017)
Article Google Scholar
Zhou, Y., Leung, H., Xu, B.: Examining the potentially confounding effect of class size on the associations between object metrics and change-proneness. IEEE Trans. Softw. Eng. 35(5), 607–623 (2009)
Article Google Scholar
Catolino, G., Ferrucci, F.: Ensemble techniques for software change prediction: a preliminary investigation. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 25–30. IEEE (2018)
Google Scholar
Zhu, X., He, Y., Cheng, L., Jia, X., Zhu, L.: Software change-proneness prediction through combination of bagging and resampling methods. J. Softw. Evol. Process 30(12), 1–17 (2018)
Article Google Scholar
Rathore, S.S., Kumar, S.: An empirical study of ensemble techniques for software fault prediction. Appl. Intell. 1–30 (2020)
Google Scholar
Aljamaan, H., Alazba, A.: Software defect prediction using tree-based ensembles. In: 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 1–10. ACM (2020)
Google Scholar
Yucular, F., Ozcift, A., Boranbag, E., Kilinc, D.: Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability. Eng. Sci. Technol. Int. J. 23(4), 938–950 (2020)
Google Scholar
Kaur, A., Kaur, K.: Performance analysis of ensemble learning for predicting defects in open source software. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 219–225. IEEE (2014)
Google Scholar
Malhotra, R., Bansal, A.: Investigation of various data analysis techniques to identify change-prone parts of an open source software. Int. J. Syst. Assurance Eng. Manage. 9(2), 401–426 (2017)
Article Google Scholar
Elish, M.O., Aljamaan, H., Ahmad, I.: Three empirical studies on predicting software maintainability using ensemble methods. Soft. Comput. 19(9), 2511–2524 (2015)
Article Google Scholar
Kumar, L., Lal, S., Goyal, A., Murthy, N.L.: Change-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques. In: Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), pp. 1–11. ACM (2019)
Google Scholar
Chidamber, S., Kemerer, C.: A metric suite for object-oriented design. IEEE Trans. Softw. Eng. 20, 476–493 (1994)
Google Scholar
Lorenz, M., Kidd, J.: Object-oriented Software Metrics: A Practical Guide. Prentice-Hall, Inc. (1994)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Webb, G.I.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)
Google Scholar
Brieman, L.: Random forests. Mach. Learn. 45(1), 5– 32 (2001)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1-score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 1–13 (2020)
Google Scholar
Shatnawi, R.: Improving software fault-prediction for imbalanced data. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 54–59. IEEE, UAE (2012)
Google Scholar
Baskin, I.I., Marcou, G., Horvath, D., Varnek, A.: Random subspaces and random forest. Tutorials Chemoinform. 263–269 (2017)
Google Scholar
Bustamam, A., Musti, M.I.S., Hartomo, S., Aprilia, S., Tampubolon, P.P., Lestari, D.: Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences. BMC Genom. 20(9), 950–963 (2019)
Article Google Scholar
Alpaydin, E.: Introduction to Machine Learning, 3rd edn. MIT Press, Cambridge (2014)
MATH Google Scholar
Malhotra, R., Khanna, M.: An explanatory study for software change prediction in object-oriented systems using hybridized techniques. Autom. Softw. Eng. 24(3), 673–717 (2017)
Article Google Scholar
Sohail, M.N., Jiadong, R., Uba, M.M., Irshad, M., Iqbal, W., Arshad, J., John, A.V.: A hybrid forecast cost benefit classification of diabetes mellitus prevalence based on epidemiological study on real-life patient’s data. Sci. Rep. 9(1), 1–10 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Sri Guru Gobind Singh College of Commerce, University of Delhi, Delhi, India
Megha Khanna, Srishti Priya & Diksha Mehra

Authors

Megha Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Srishti Priya
View author publications
You can also search for this author in PubMed Google Scholar
Diksha Mehra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tampere University, Tampere, Finland
Davide Taibi
LUT University, Lahti, Finland
Valentina Lenarduzzi
Tampere University, Tampere, Finland
Terhi Kilamo
Université de Paris and Inria, Paris, France
Stefano Zacchiroli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khanna, M., Priya, S., Mehra, D. (2021). Software Change Prediction with Homogeneous Ensemble Learners on Large Scale Open-Source Systems. In: Taibi, D., Lenarduzzi, V., Kilamo, T., Zacchiroli, S. (eds) Open Source Systems. OSS 2021. IFIP Advances in Information and Communication Technology, vol 624. Springer, Cham. https://doi.org/10.1007/978-3-030-75251-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-75251-4_7
Published: 05 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75250-7
Online ISBN: 978-3-030-75251-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)