Predicting software defect type using concept-based classification

Patil, Sangameshwar; Ravindran, B.

doi:10.1007/s10664-019-09779-6

Predicting software defect type using concept-based classification

Published: 12 February 2020

Volume 25, pages 1341–1378, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

871 Accesses
12 Citations
Explore all metrics

Abstract

Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management process. A major challenge for the supervised learning based current approaches for this task is the need for labeled training data. Creating such data is an expensive and effort-intensive task requiring domain-specific expertise. In this paper, we propose to circumvent this problem by carrying out concept-based classification (CBC) of software defect reports with help of the Explicit Semantic Analysis (ESA) framework. We first create the concept-based representations of a software defect report and the defect types in the software defect classification scheme by projecting their textual descriptions into a concept-space spanned by the Wikipedia articles. Then, we compute the “semantic” similarity between these concept-based representations and assign the software defect type that has the highest similarity with the defect report. The proposed approach achieves accuracy comparable to the state-of-the-art semi-supervised and active learning approach for this task without requiring labeled training data. Additional advantages of the CBC approach are: (i) unlike the state-of-the-art, it does not need the source code used to fix a software defect, and (ii) it does not suffer from the class-imbalance problem faced by the supervised learning paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AutoODC: Automated generation of orthogonal defect classifications

Article 03 June 2014

A set of measures designed to identify overlapped instances in software defect prediction

Article 10 January 2017

Improving Defect Localization by Classifying the Affected Asset Using Machine Learning

Notes

Note that Table 1 and Table 8 contain only the introductory definition snippets from the classification schemes. Their detailed descriptions along with contextual information and examples are available in IBM (2013a, b) and IEEE (2009).
The expert needs to refer to IBM(2013a, b) to get the detailed descriptions and understand the defect type classification scheme.
https://www.wikipedia.org
Following the ESA terminology, we use “a concept” and “a Wikipedia article” interchangeably.
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
Available from https://dumps.wikimedia.org
Notion of a stub-article in Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Stub
https://issues.apache.org/jira/issues/
Mahout, the machine learning library, https://mahout.apache.org
Lucene, the search engine library https://lucene.apache.org/core
OpenNLP, the natural language processing library https://opennlp.apache.org
https://github.com/roundcube/roundcubemail/issues
https://roundcube.net/about/
https://en.wikipedia.org/wiki/Help:Wikitext
https://en.wikipedia.org/wiki/Vandalism_on_Wikipedia#Fighting_vandalism
https://en.wikipedia.org/wiki/Reliability_of_Wikipedia#Assessments
https://archive.ics.uci.edu/ml/datasets/iris

References

Alenezi M, Magel K, Banitaan S (2013) Efficient bug triaging using text mining. Journal of Software 8(9):2185–2190
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92, pp 144–152. https://doi.org/10.1145/130385.130401
Bridge N, Miller C (1998) Orthogonal defect classification using defect data to improve software development. Software Quality 3(1):1–8
Google Scholar
Butcher M, Munro H, Kratschmer T (2002) Improving software testing via ODC: Three case studies. IBM Syst J 41(1):31–44
Article Google Scholar
Carrozza G, Pietrantuono R, Russo S (2015) Defect analysis in mission-critical software systems: a detailed investigation. Journal of Software: Evolution and Process 27(1):22–49
Article Google Scholar
Chawla NV, Japkowicz N, Kotcz A (2004) Edit: Special issue on learning from imbalanced data sets. SIGKDD Explorations Newsletter 6(1):1–6. 10.1145/1007730.1007733
Article Google Scholar
Chillarege R (1996) Orthogonal defect classification. Handbook of Software Reliability Engineering, pp 359–399
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Cortes C, Vapnik V (1995) Support vector machine. Mach. Learn. 20(3):273–297. https://doi.org/10.1007/BF00994018
MATH Google Scholar
Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of 16th international conference on software engineering & knowledge engineering (SEKE)
Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. 29(2):8
Article Google Scholar
Ferschke O, Zesch T, Gurevych I (2011) Wikipedia revision toolkit: Efficiently accessing Wikipedia’s edit history. In: Proceedings of the ACL-HLT 2011 system demonstrations, association for computational linguistics, pp 97–102
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th intl. Joint conf. on artificial intelligence (IJCAI), vol 7, pp 1606–1611
Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34:443–498
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM Sigmod record, vol 29. ACM, pp 1–12
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans on Knowledge and Data Engineering
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of 35th international conference on software engineering, pp 392–401
Huang L, Ng V, Persing I, Geng R, Bai X, Tian J (2011) AutoODC: Automated generation of orthogonal defect classifications. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE)
Huang L, Ng V, Persing I, Chen M, Li Z, Geng R, Tian J (2015) AutoODC: Automated generation of orthogonal defect classifications. Automated Software Engineering Journal 22(1):3–46
Article Google Scholar
IBM (2013a) Orthogonal defect classification version 5.2 extensions for defects in GUI, user documentation, build and national language support (NLS). https://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2-Extensions.pdf, (URL accessibility verified on 9^th Nov., 2018)
IBM (2013b) Orthogonal defect classification version 5.2 for software design and code. http://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2.pdf, (URL accessibility verified on 9^th Nov., 2018)
IEEE (2009) IEEE standard 1044-2009 classification for software anomalies
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5):429–449
Article Google Scholar
Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River
Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Book Google Scholar
Mellegård N, Staron M, Törner F (2012) A light-weight defect classification scheme for embedded automotive software and its initial evaluation. In: Proceedings of IEEE 23rd International Symp. on Software Reliability Engineering (ISSRE), pp 261–270
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance (ICSM), pp 346–355
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 522–531
Patil S (2017) Concept based classification of software defect reports. In: Proceedings of 14th international conference on mining software repositories (MSR), IEEE/ACM
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Robertson S, Zaragoza H, et al (2009) The probabilistic relevance framework: BM25 and beyond. Foundations and Trends®; in Information Retrieval 3(4):333–389
Article Google Scholar
Robertson S E, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M, et al (1995) Okapi at TREC-3. NIST Special Publication Sp 109:109
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of 29th international conference on software engineering. IEEE Computer Society, pp 499–510
Salton G, McGill M J (1986) Introduction to modern information retrieval. McGraw-Hill Inc, New York
MATH Google Scholar
Silva N, Vieira M (2014) Experience report: orthogonal classification of safety critical issues. In: 2014 IEEE 25th international symposium on software reliability engineering. IEEE, pp 156–166
Student (1908) The probable error of a mean. Biometrika 6(1):1–25. https://doi.org/10.1093/biomet/6.1.1
Article MATH Google Scholar
Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of 19th working conference on reverse engineering (WCRE). IEEE, pp 205–214
Thung F, Le X-BD, Lo D (2015) Active semi-supervised defect categorization. In: Proceedings of IEEE 23rd international conference on program comprehension (ICPC), pp 60–70
Vallespir D, Grazioli F, Herbert J (2009) A framework to evaluate defect taxonomies. In: Proceedings of XV Congreso Argentino de Ciencias de La Computación
Wagner S (2008) Defect classification and defect types revisited. In: Proceedings of workshop on defects in large software systems. ACM, pp 39–40
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470
Xia X, Lo D, Wang X, Zhou B (2014) Automatic defect categorization based on fault triggering conditions. In: Proceedings of 19th international conference on engineering of complex computer systems (ICECCS). IEEE, pp 39–48
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans on pattern analysis and machine intelligence. https://doi.org/10.1109/TPAMI.2018.2857768
Article Google Scholar
Yang YY, Lee SC, Chung YA, Wu TE, Chen SA, Lin HT (2017) libact: Pool-based active learning in python. Tech. rep., National Taiwan University. https://github.com/ntucllab/libact, available as arXiv:1710.00379
Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Book Google Scholar
Zesch T, Müller C, Gurevych I (2008) Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of 6th International conference on language resources and evaluation (LREC), vol 8, pp 1646–1652
Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28(3)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, IIT Madras, Chennai, India
Sangameshwar Patil & B. Ravindran
TCS Research, Pune, India
Sangameshwar Patil
Robert Bosch Center for Data Science and AI, IIT Madras, Chennai, India
B. Ravindran

Authors

Sangameshwar Patil
View author publications
You can also search for this author in PubMed Google Scholar
B. Ravindran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sangameshwar Patil.

Additional information

Communicated by: Tim Menzies

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary, work-in-progress version of this work was presented as a short paper – “Concept based Classification of Software Defect Reports”, Sangameshwar Patil, Mining Software Repositories (MSR), 2017. This article is a significantly extended version of the short paper with new results and analysis.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(XLSX 7.68 KB)

(XLSX 54.7 KB)

Appendices

Appendix A: IEEE 1044-2009 Standard based Software Defect Type Classification Scheme

Table 8 The software defect type families based on the sample defect type classification scheme in Table A.1 (Annexure A) of IEEE 1044-2009 Standard (IEEE 2009)

Full size table

Appendix B: Additional Figures for Experimental Results of RQ2

In this section, we provide the additional figures summarizing the experimental results of the RQ2 to analyze the effect of change in number of concepts (N) used on the coverage and accuracy of the concept-based classification (CBC) approach. The analysis of these results is already discussed in the Section 4.3.2.

1.1 B.1: RQ2 Results for Roundcube Dataset and IEEE-Based Classification Scheme

1.2 B.2: RQ2 Results for Roundcube Dataset and ODC-Based Classification Scheme

1.3 B.3: RQ2 Results for Apache-Libs Dataset and IEEE-Based Classification Scheme

Appendix C: Dataset Annotation Details

The annotations for the Apache-Libs dataset by Thung et al. (2012) were done before the IBM ODC version 5.2 and its extensions (IBM2013a, b) were made available (12^th Sept. 2013). This new version of IBM ODC v5.2 extensions (IBM 2013a) introduces additional defect types. It includes a new National Language Support (NLS) type of defect (i.e., “Problems encountered in the implementation of the product functions in languages other than English”). These changes in the ODC scheme could not have been considered by Thung et al. (2012). To account for the changes in the defect type families due to the IBM ODC v5.2 extensions (IBM 2013a) as well as to improve the robustness of this dataset as a benchmark, we re-annotated the dataset. The annotations were done by a software professional with multi-year experience in software design, development, testing, and debugging experience.

Out of the 500 defect type annotations in this dataset, there are 472 annotations which matched with the original annotations by Thung et al. (2012) and there are 28 annotation disagreements. There are 94.4% matching annotations with Thung et al. (2012) and the inter-annotator agreement with their original annotations using Cohen’s kappa statistic (Cohen 1960) is 90.02%. Note that this is a very high-level of inter-annotator agreement. The 28 annotations which differed with the original annotations were further reviewed and verified by another software professional with more than a decade’s hands-on experience in software development life-cycle. This review led to change in annotations of 2 defect reports (out of the 28 defect reports with differing annotations). These two annotations were analyzed in the discussions between the two annotators and the corrections were approved.

We make the annotated dataset available for research prupose as Supplementary Material along with the paper as well as on email request. The high-level of inter-annotator agreement (the 94.4% matching annotations and Cohen’s κ = 90.02%) as well as the explanatory comments for the few differing annotations make this dataset a high-quality benchmark for software defect type classification task. Table 5 shows the dataset statistics and the label distribution in the ground truth annotations. For other combinations of datasets and classification schemes used in this paper, the annotation process was similar. Details of inter-annotator agreement for annotations of other combinations of datasets and classification scheme are mentioned in Section 4.1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, S., Ravindran, B. Predicting software defect type using concept-based classification. Empir Software Eng 25, 1341–1378 (2020). https://doi.org/10.1007/s10664-019-09779-6

Download citation

Published: 12 February 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10664-019-09779-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting software defect type using concept-based classification

Abstract

Access this article

Similar content being viewed by others

AutoODC: Automated generation of orthogonal defect classifications

A set of measures designed to identify overlapped instances in software defect prediction

Improving Defect Localization by Classifying the Affected Asset Using Machine Learning

Notes

References