Skip to main content

Software Change Prediction with Homogeneous Ensemble Learners on Large Scale Open-Source Systems

  • Conference paper
  • First Online:
Open Source Systems (OSS 2021)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 624))

Included in the following conference series:

  • 221 Accesses

Abstract

Customizability, extensive community support and ease of availability have led to the popularity of Open-Source Software (OSS) systems. However, maintenance of these systems is a challenge especially as they become considerably large and complex with time. One possible method of ensuring effective quality in large scale OSS is the adoption of software change prediction models. These models aid in identifying change-prone parts in the early stages of software development, which can then be effectively managed by software practitioners. This study extensively evaluates eight Homogeneous Ensemble Learners (HEL) for developing software change prediction models on five large scale OSS datasets. HEL, which integrate the outputs of several learners of the same type are known to generate improved results than other non-ensemble classifiers. The study also statistically compares the results of the models developed by HEL with ten non-ensemble classifiers. We further assess the change in performance of HEL for developing software change prediction models by substituting their default base learners with other classifiers. The results of the study support the use of HEL for developing software change prediction models and indicate Random Forest as the best HEL for the purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Malhotra, R., Khanna, M.: Software change prediction: a systematic review and future guidelines. eInformatica Softw. Eng. J. 13(1), 227–259 (2019)

    Google Scholar 

  2. Malhotra, R., Khanna, M.: An empirical study for software change prediction using imbalanced data. Empirical Softw. Eng. 22(6), 2806–2851 (2017)

    Article  Google Scholar 

  3. Zhou, Y., Leung, H., Xu, B.: Examining the potentially confounding effect of class size on the associations between object metrics and change-proneness. IEEE Trans. Softw. Eng. 35(5), 607–623 (2009)

    Article  Google Scholar 

  4. Catolino, G., Ferrucci, F.: Ensemble techniques for software change prediction: a preliminary investigation. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 25–30. IEEE (2018)

    Google Scholar 

  5. Zhu, X., He, Y., Cheng, L., Jia, X., Zhu, L.: Software change-proneness prediction through combination of bagging and resampling methods. J. Softw. Evol. Process 30(12), 1–17 (2018)

    Article  Google Scholar 

  6. Rathore, S.S., Kumar, S.: An empirical study of ensemble techniques for software fault prediction. Appl. Intell. 1–30 (2020)

    Google Scholar 

  7. Aljamaan, H., Alazba, A.: Software defect prediction using tree-based ensembles. In: 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 1–10. ACM (2020)

    Google Scholar 

  8. Yucular, F., Ozcift, A., Boranbag, E., Kilinc, D.: Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability. Eng. Sci. Technol. Int. J. 23(4), 938–950 (2020)

    Google Scholar 

  9. Kaur, A., Kaur, K.: Performance analysis of ensemble learning for predicting defects in open source software. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 219–225. IEEE (2014)

    Google Scholar 

  10. Malhotra, R., Bansal, A.: Investigation of various data analysis techniques to identify change-prone parts of an open source software. Int. J. Syst. Assurance Eng. Manage. 9(2), 401–426 (2017)

    Article  Google Scholar 

  11. Elish, M.O., Aljamaan, H., Ahmad, I.: Three empirical studies on predicting software maintainability using ensemble methods. Soft. Comput. 19(9), 2511–2524 (2015)

    Article  Google Scholar 

  12. Kumar, L., Lal, S., Goyal, A., Murthy, N.L.: Change-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques. In: Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), pp. 1–11. ACM (2019)

    Google Scholar 

  13. Chidamber, S., Kemerer, C.: A metric suite for object-oriented design. IEEE Trans. Softw. Eng. 20, 476–493 (1994)

    Google Scholar 

  14. Lorenz, M., Kidd, J.: Object-oriented Software Metrics: A Practical Guide. Prentice-Hall, Inc. (1994)

    Google Scholar 

  15. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  16. Webb, G.I.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)

    Google Scholar 

  17. Brieman, L.: Random forests. Mach. Learn. 45(1), 5– 32 (2001)

    Google Scholar 

  18. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  19. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1-score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 1–13 (2020)

    Google Scholar 

  20. Shatnawi, R.: Improving software fault-prediction for imbalanced data. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 54–59. IEEE, UAE (2012)

    Google Scholar 

  21. Baskin, I.I., Marcou, G., Horvath, D., Varnek, A.: Random subspaces and random forest. Tutorials Chemoinform. 263–269 (2017)

    Google Scholar 

  22. Bustamam, A., Musti, M.I.S., Hartomo, S., Aprilia, S., Tampubolon, P.P., Lestari, D.: Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences. BMC Genom. 20(9), 950–963 (2019)

    Article  Google Scholar 

  23. Alpaydin, E.: Introduction to Machine Learning, 3rd edn. MIT Press, Cambridge (2014)

    MATH  Google Scholar 

  24. Malhotra, R., Khanna, M.: An explanatory study for software change prediction in object-oriented systems using hybridized techniques. Autom. Softw. Eng. 24(3), 673–717 (2017)

    Article  Google Scholar 

  25. Sohail, M.N., Jiadong, R., Uba, M.M., Irshad, M., Iqbal, W., Arshad, J., John, A.V.: A hybrid forecast cost benefit classification of diabetes mellitus prevalence based on epidemiological study on real-life patient’s data. Sci. Rep. 9(1), 1–10 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khanna, M., Priya, S., Mehra, D. (2021). Software Change Prediction with Homogeneous Ensemble Learners on Large Scale Open-Source Systems. In: Taibi, D., Lenarduzzi, V., Kilamo, T., Zacchiroli, S. (eds) Open Source Systems. OSS 2021. IFIP Advances in Information and Communication Technology, vol 624. Springer, Cham. https://doi.org/10.1007/978-3-030-75251-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75251-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75250-7

  • Online ISBN: 978-3-030-75251-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics