Abstract
Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management process. A major challenge for the supervised learning based current approaches for this task is the need for labeled training data. Creating such data is an expensive and effort-intensive task requiring domain-specific expertise. In this paper, we propose to circumvent this problem by carrying out concept-based classification (CBC) of software defect reports with help of the Explicit Semantic Analysis (ESA) framework. We first create the concept-based representations of a software defect report and the defect types in the software defect classification scheme by projecting their textual descriptions into a concept-space spanned by the Wikipedia articles. Then, we compute the “semantic” similarity between these concept-based representations and assign the software defect type that has the highest similarity with the defect report. The proposed approach achieves accuracy comparable to the state-of-the-art semi-supervised and active learning approach for this task without requiring labeled training data. Additional advantages of the CBC approach are: (i) unlike the state-of-the-art, it does not need the source code used to fix a software defect, and (ii) it does not suffer from the class-imbalance problem faced by the supervised learning paradigm.
Similar content being viewed by others
Notes
Following the ESA terminology, we use “a concept” and “a Wikipedia article” interchangeably.
Available from https://dumps.wikimedia.org
Notion of a stub-article in Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Stub
Mahout, the machine learning library, https://mahout.apache.org
Lucene, the search engine library https://lucene.apache.org/core
OpenNLP, the natural language processing library https://opennlp.apache.org
References
Alenezi M, Magel K, Banitaan S (2013) Efficient bug triaging using text mining. Journal of Software 8(9):2185–2190
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92, pp 144–152. https://doi.org/10.1145/130385.130401
Bridge N, Miller C (1998) Orthogonal defect classification using defect data to improve software development. Software Quality 3(1):1–8
Butcher M, Munro H, Kratschmer T (2002) Improving software testing via ODC: Three case studies. IBM Syst J 41(1):31–44
Carrozza G, Pietrantuono R, Russo S (2015) Defect analysis in mission-critical software systems: a detailed investigation. Journal of Software: Evolution and Process 27(1):22–49
Chawla NV, Japkowicz N, Kotcz A (2004) Edit: Special issue on learning from imbalanced data sets. SIGKDD Explorations Newsletter 6(1):1–6. 10.1145/1007730.1007733
Chillarege R (1996) Orthogonal defect classification. Handbook of Software Reliability Engineering, pp 359–399
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Cortes C, Vapnik V (1995) Support vector machine. Mach. Learn. 20(3):273–297. https://doi.org/10.1007/BF00994018
Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of 16th international conference on software engineering & knowledge engineering (SEKE)
Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. 29(2):8
Ferschke O, Zesch T, Gurevych I (2011) Wikipedia revision toolkit: Efficiently accessing Wikipedia’s edit history. In: Proceedings of the ACL-HLT 2011 system demonstrations, association for computational linguistics, pp 97–102
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th intl. Joint conf. on artificial intelligence (IJCAI), vol 7, pp 1606–1611
Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34:443–498
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM Sigmod record, vol 29. ACM, pp 1–12
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans on Knowledge and Data Engineering
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of 35th international conference on software engineering, pp 392–401
Huang L, Ng V, Persing I, Geng R, Bai X, Tian J (2011) AutoODC: Automated generation of orthogonal defect classifications. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE)
Huang L, Ng V, Persing I, Chen M, Li Z, Geng R, Tian J (2015) AutoODC: Automated generation of orthogonal defect classifications. Automated Software Engineering Journal 22(1):3–46
IBM (2013a) Orthogonal defect classification version 5.2 extensions for defects in GUI, user documentation, build and national language support (NLS). https://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2-Extensions.pdf, (URL accessibility verified on 9th Nov., 2018)
IBM (2013b) Orthogonal defect classification version 5.2 for software design and code. http://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2.pdf, (URL accessibility verified on 9th Nov., 2018)
IEEE (2009) IEEE standard 1044-2009 classification for software anomalies
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5):429–449
Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Mellegård N, Staron M, Törner F (2012) A light-weight defect classification scheme for embedded automotive software and its initial evaluation. In: Proceedings of IEEE 23rd International Symp. on Software Reliability Engineering (ISSRE), pp 261–270
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance (ICSM), pp 346–355
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 522–531
Patil S (2017) Concept based classification of software defect reports. In: Proceedings of 14th international conference on mining software repositories (MSR), IEEE/ACM
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Robertson S, Zaragoza H, et al (2009) The probabilistic relevance framework: BM25 and beyond. Foundations and Trends®; in Information Retrieval 3(4):333–389
Robertson S E, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M, et al (1995) Okapi at TREC-3. NIST Special Publication Sp 109:109
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of 29th international conference on software engineering. IEEE Computer Society, pp 499–510
Salton G, McGill M J (1986) Introduction to modern information retrieval. McGraw-Hill Inc, New York
Silva N, Vieira M (2014) Experience report: orthogonal classification of safety critical issues. In: 2014 IEEE 25th international symposium on software reliability engineering. IEEE, pp 156–166
Student (1908) The probable error of a mean. Biometrika 6(1):1–25. https://doi.org/10.1093/biomet/6.1.1
Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of 19th working conference on reverse engineering (WCRE). IEEE, pp 205–214
Thung F, Le X-BD, Lo D (2015) Active semi-supervised defect categorization. In: Proceedings of IEEE 23rd international conference on program comprehension (ICPC), pp 60–70
Vallespir D, Grazioli F, Herbert J (2009) A framework to evaluate defect taxonomies. In: Proceedings of XV Congreso Argentino de Ciencias de La Computación
Wagner S (2008) Defect classification and defect types revisited. In: Proceedings of workshop on defects in large software systems. ACM, pp 39–40
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470
Xia X, Lo D, Wang X, Zhou B (2014) Automatic defect categorization based on fault triggering conditions. In: Proceedings of 19th international conference on engineering of complex computer systems (ICECCS). IEEE, pp 39–48
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans on pattern analysis and machine intelligence. https://doi.org/10.1109/TPAMI.2018.2857768
Yang YY, Lee SC, Chung YA, Wu TE, Chen SA, Lin HT (2017) libact: Pool-based active learning in python. Tech. rep., National Taiwan University. https://github.com/ntucllab/libact, available as arXiv:1710.00379
Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Zesch T, Müller C, Gurevych I (2008) Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of 6th International conference on language resources and evaluation (LREC), vol 8, pp 1646–1652
Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28(3)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Tim Menzies
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary, work-in-progress version of this work was presented as a short paper – “Concept based Classification of Software Defect Reports”, Sangameshwar Patil, Mining Software Repositories (MSR), 2017. This article is a significantly extended version of the short paper with new results and analysis.
Electronic supplementary material
Appendices
Appendix A: IEEE 1044-2009 Standard based Software Defect Type Classification Scheme
Appendix B: Additional Figures for Experimental Results of RQ2
In this section, we provide the additional figures summarizing the experimental results of the RQ2 to analyze the effect of change in number of concepts (N) used on the coverage and accuracy of the concept-based classification (CBC) approach. The analysis of these results is already discussed in the Section 4.3.2.
1.1 B.1: RQ2 Results for Roundcube Dataset and IEEE-Based Classification Scheme
1.2 B.2: RQ2 Results for Roundcube Dataset and ODC-Based Classification Scheme
1.3 B.3: RQ2 Results for Apache-Libs Dataset and IEEE-Based Classification Scheme
Appendix C: Dataset Annotation Details
The annotations for the Apache-Libs dataset by Thung et al. (2012) were done before the IBM ODC version 5.2 and its extensions (IBM2013a, b) were made available (12th Sept. 2013). This new version of IBM ODC v5.2 extensions (IBM 2013a) introduces additional defect types. It includes a new National Language Support (NLS) type of defect (i.e., “Problems encountered in the implementation of the product functions in languages other than English”). These changes in the ODC scheme could not have been considered by Thung et al. (2012). To account for the changes in the defect type families due to the IBM ODC v5.2 extensions (IBM 2013a) as well as to improve the robustness of this dataset as a benchmark, we re-annotated the dataset. The annotations were done by a software professional with multi-year experience in software design, development, testing, and debugging experience.
Out of the 500 defect type annotations in this dataset, there are 472 annotations which matched with the original annotations by Thung et al. (2012) and there are 28 annotation disagreements. There are 94.4% matching annotations with Thung et al. (2012) and the inter-annotator agreement with their original annotations using Cohen’s kappa statistic (Cohen 1960) is 90.02%. Note that this is a very high-level of inter-annotator agreement. The 28 annotations which differed with the original annotations were further reviewed and verified by another software professional with more than a decade’s hands-on experience in software development life-cycle. This review led to change in annotations of 2 defect reports (out of the 28 defect reports with differing annotations). These two annotations were analyzed in the discussions between the two annotators and the corrections were approved.
We make the annotated dataset available for research prupose as Supplementary Material along with the paper as well as on email request. The high-level of inter-annotator agreement (the 94.4% matching annotations and Cohen’s κ = 90.02%) as well as the explanatory comments for the few differing annotations make this dataset a high-quality benchmark for software defect type classification task. Table 5 shows the dataset statistics and the label distribution in the ground truth annotations. For other combinations of datasets and classification schemes used in this paper, the annotation process was similar. Details of inter-annotator agreement for annotations of other combinations of datasets and classification scheme are mentioned in Section 4.1.
Rights and permissions
About this article
Cite this article
Patil, S., Ravindran, B. Predicting software defect type using concept-based classification. Empir Software Eng 25, 1341–1378 (2020). https://doi.org/10.1007/s10664-019-09779-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09779-6