Skip to main content

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

  • Conference paper
  • First Online:
Book cover Computational Science and Its Applications -- ICCSA 2016 (ICCSA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9789))

Included in the following conference series:

Abstract

Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data.

In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and fixed bugs with the corresponding source code elements (classes and files) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and finally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100 %).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com.

  2. 2.

    https://www.sourcemeter.com.

  3. 3.

    http://www.cs.waikato.ac.nz/ml/weka/.

References

  1. Arisholm, E., Briand, L.C.: Predicting fault-prone components in a java legacy system. In: Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering, pp. 8–17. ACM (2006)

    Google Scholar 

  2. Bangcharoensap, P., Ihara, A., Kamei, Y., Matsumoto, K.: Locating source code to be fixed based on initial bug reports - a case study on the eclipse project. In: 2012 Fourth International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 10–15, October 2012

    Google Scholar 

  3. Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)

    Article  Google Scholar 

  4. Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)

    Article  Google Scholar 

  5. Dallmeier, V., Zimmermann, T.: Extraction of bug localization benchmarks from history. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 433–436. ACM (2007)

    Google Scholar 

  6. D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pp. 31–41. IEEE (2010)

    Google Scholar 

  7. Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: 11th Annual Conference on Computational Learning Theory, pp. 209–217. ACM Press, New York (1998)

    Google Scholar 

  8. Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R.: Characterization of source code defects by data mining conducted on GitHub. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2015. LNCS, vol. 9159, pp. 47–62. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  9. Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)

    Article  Google Scholar 

  10. Hall, T., Zhang, M., Bowes, D., Sun, Y.: Some code smells have a significant but small effect on faults. ACM Trans. Softw. Eng. Methodol. (TOSEM) 23(4), 33 (2014)

    Article  Google Scholar 

  11. He, H., Garcia, E., et al.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  12. Kamei, Y., Shihab, E.: Defect prediction: Accomplishments and future challenges

    Google Scholar 

  13. Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data, June 2012

    Google Scholar 

  14. Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461. ACM (2006)

    Google Scholar 

  15. Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Automating algorithms for the identification of fault-prone files. In: Proceedings of the 2007 International Symposium on Software Testing and Analysis, pp. 219–227. ACM (2007)

    Google Scholar 

  16. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  17. Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? ACM SIGSOFT Softw. Eng. Notes 30(4), 1–5 (2005)

    Article  Google Scholar 

  18. Tufano, M., Palomba, F., Bavota, G., Oliveto, R., Di Penta, M., De Lucia, A., Poshyvanyk, D.: When and why your code starts to smell bad. In: 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, 16–24 May 2015, vol. 1, pp. 403–414 (2015)

    Google Scholar 

  19. Von Krogh, G., Von Hippel, E.: The promise of research on open source software. Manage. Sci. 52(7), 975–983 (2006)

    Article  Google Scholar 

  20. Wang, S., Lo, D.: Version history, similar report, structure: putting them together for improved bug localization. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 53–63. ACM (2014)

    Google Scholar 

  21. Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)

    Article  Google Scholar 

  22. Williams, C., Spacco, J.: Szz revisited: verifying when changes induce fixes. In: Proceedings of the Workshop on Defects in Large Software Systems, pp. 32–36. ACM (2008)

    Google Scholar 

  23. Zhou, Y., Leung, H.: Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans. Softw. Eng. 32(10), 771–789 (2006)

    Article  Google Scholar 

  24. Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: International Workshop on Predictor Models in Software Engineering, PROMISE 2007: ICSE Workshopps 2007, p. 9. IEEE (2007)

    Google Scholar 

Download references

Acknowledgment

This work was partially supported by the European Union project “REPARA – Reengineering and Enabling Performance And poweR of Applications”, project number: 609666.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zoltán Tóth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tóth, Z., Gyimesi, P., Ferenc, R. (2016). A Public Bug Database of GitHub Projects and Its Application in Bug Prediction. In: Gervasi, O., et al. Computational Science and Its Applications -- ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9789. Springer, Cham. https://doi.org/10.1007/978-3-319-42089-9_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42089-9_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42088-2

  • Online ISBN: 978-3-319-42089-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics