Abstract
Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data.
In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and fixed bugs with the corresponding source code elements (classes and files) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and finally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100 %).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arisholm, E., Briand, L.C.: Predicting fault-prone components in a java legacy system. In: Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering, pp. 8–17. ACM (2006)
Bangcharoensap, P., Ihara, A., Kamei, Y., Matsumoto, K.: Locating source code to be fixed based on initial bug reports - a case study on the eclipse project. In: 2012 Fourth International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 10–15, October 2012
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Dallmeier, V., Zimmermann, T.: Extraction of bug localization benchmarks from history. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 433–436. ACM (2007)
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pp. 31–41. IEEE (2010)
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: 11th Annual Conference on Computational Learning Theory, pp. 209–217. ACM Press, New York (1998)
Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R.: Characterization of source code defects by data mining conducted on GitHub. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2015. LNCS, vol. 9159, pp. 47–62. Springer, Heidelberg (2015)
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)
Hall, T., Zhang, M., Bowes, D., Sun, Y.: Some code smells have a significant but small effect on faults. ACM Trans. Softw. Eng. Methodol. (TOSEM) 23(4), 33 (2014)
He, H., Garcia, E., et al.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Kamei, Y., Shihab, E.: Defect prediction: Accomplishments and future challenges
Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data, June 2012
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461. ACM (2006)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Automating algorithms for the identification of fault-prone files. In: Proceedings of the 2007 International Symposium on Software Testing and Analysis, pp. 219–227. ACM (2007)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? ACM SIGSOFT Softw. Eng. Notes 30(4), 1–5 (2005)
Tufano, M., Palomba, F., Bavota, G., Oliveto, R., Di Penta, M., De Lucia, A., Poshyvanyk, D.: When and why your code starts to smell bad. In: 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, 16–24 May 2015, vol. 1, pp. 403–414 (2015)
Von Krogh, G., Von Hippel, E.: The promise of research on open source software. Manage. Sci. 52(7), 975–983 (2006)
Wang, S., Lo, D.: Version history, similar report, structure: putting them together for improved bug localization. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 53–63. ACM (2014)
Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)
Williams, C., Spacco, J.: Szz revisited: verifying when changes induce fixes. In: Proceedings of the Workshop on Defects in Large Software Systems, pp. 32–36. ACM (2008)
Zhou, Y., Leung, H.: Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans. Softw. Eng. 32(10), 771–789 (2006)
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: International Workshop on Predictor Models in Software Engineering, PROMISE 2007: ICSE Workshopps 2007, p. 9. IEEE (2007)
Acknowledgment
This work was partially supported by the European Union project “REPARA – Reengineering and Enabling Performance And poweR of Applications”, project number: 609666.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Tóth, Z., Gyimesi, P., Ferenc, R. (2016). A Public Bug Database of GitHub Projects and Its Application in Bug Prediction. In: Gervasi, O., et al. Computational Science and Its Applications -- ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9789. Springer, Cham. https://doi.org/10.1007/978-3-319-42089-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-42089-9_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42088-2
Online ISBN: 978-3-319-42089-9
eBook Packages: Computer ScienceComputer Science (R0)