A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Tóth, Zoltán; Gyimesi, Péter; Ferenc, Rudolf

doi:10.1007/978-3-319-42089-9_44

Zoltán Tóth²²,
Péter Gyimesi²² &
Rudolf Ferenc²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9789))

Included in the following conference series:

International Conference on Computational Science and Its Applications

2304 Accesses
15 Citations

Abstract

Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data.

In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and fixed bugs with the corresponding source code elements (classes and files) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and finally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100 %).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Arisholm, E., Briand, L.C.: Predicting fault-prone components in a java legacy system. In: Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering, pp. 8–17. ACM (2006)
Google Scholar
Bangcharoensap, P., Ihara, A., Kamei, Y., Matsumoto, K.: Locating source code to be fixed based on initial bug reports - a case study on the eclipse project. In: 2012 Fourth International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 10–15, October 2012
Google Scholar
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Article Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Article Google Scholar
Dallmeier, V., Zimmermann, T.: Extraction of bug localization benchmarks from history. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 433–436. ACM (2007)
Google Scholar
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pp. 31–41. IEEE (2010)
Google Scholar
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: 11th Annual Conference on Computational Learning Theory, pp. 209–217. ACM Press, New York (1998)
Google Scholar
Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R.: Characterization of source code defects by data mining conducted on GitHub. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2015. LNCS, vol. 9159, pp. 47–62. Springer, Heidelberg (2015)
Chapter Google Scholar
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)
Article Google Scholar
Hall, T., Zhang, M., Bowes, D., Sun, Y.: Some code smells have a significant but small effect on faults. ACM Trans. Softw. Eng. Methodol. (TOSEM) 23(4), 33 (2014)
Article Google Scholar
He, H., Garcia, E., et al.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Kamei, Y., Shihab, E.: Defect prediction: Accomplishments and future challenges
Google Scholar
Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data, June 2012
Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461. ACM (2006)
Google Scholar
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Automating algorithms for the identification of fault-prone files. In: Proceedings of the 2007 International Symposium on Software Testing and Analysis, pp. 219–227. ACM (2007)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? ACM SIGSOFT Softw. Eng. Notes 30(4), 1–5 (2005)
Article Google Scholar
Tufano, M., Palomba, F., Bavota, G., Oliveto, R., Di Penta, M., De Lucia, A., Poshyvanyk, D.: When and why your code starts to smell bad. In: 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, 16–24 May 2015, vol. 1, pp. 403–414 (2015)
Google Scholar
Von Krogh, G., Von Hippel, E.: The promise of research on open source software. Manage. Sci. 52(7), 975–983 (2006)
Article Google Scholar
Wang, S., Lo, D.: Version history, similar report, structure: putting them together for improved bug localization. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 53–63. ACM (2014)
Google Scholar
Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)
Article Google Scholar
Williams, C., Spacco, J.: Szz revisited: verifying when changes induce fixes. In: Proceedings of the Workshop on Defects in Large Software Systems, pp. 32–36. ACM (2008)
Google Scholar
Zhou, Y., Leung, H.: Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans. Softw. Eng. 32(10), 771–789 (2006)
Article Google Scholar
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: International Workshop on Predictor Models in Software Engineering, PROMISE 2007: ICSE Workshopps 2007, p. 9. IEEE (2007)
Google Scholar

Download references

Acknowledgment

This work was partially supported by the European Union project “REPARA – Reengineering and Enabling Performance And poweR of Applications”, project number: 609666.

Author information

Authors and Affiliations

Department of Software Engineering, University of Szeged, Szeged, Hungary
Zoltán Tóth, Péter Gyimesi & Rudolf Ferenc

Authors

Zoltán Tóth
View author publications
You can also search for this author in PubMed Google Scholar
Péter Gyimesi
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Ferenc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zoltán Tóth .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Polytechnic University, Bari, Italy
Carmelo M. Torre
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
Beijing Univ. of Posts & Telecomm., Beijing, China
Shangguang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tóth, Z., Gyimesi, P., Ferenc, R. (2016). A Public Bug Database of GitHub Projects and Its Application in Bug Prediction. In: Gervasi, O., et al. Computational Science and Its Applications -- ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9789. Springer, Cham. https://doi.org/10.1007/978-3-319-42089-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-42089-9_44
Published: 01 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42088-2
Online ISBN: 978-3-319-42089-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics