Improving change prediction models with code smell-related information

Catolino, Gemma; Palomba, Fabio; Fontana, Francesca Arcelli; De Lucia, Andrea; Zaidman, Andy; Ferrucci, Filomena

doi:10.1007/s10664-019-09739-0

Improving change prediction models with code smell-related information

Published: 02 August 2019

Volume 25, pages 49–95, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Gemma Catolino¹,
Fabio Palomba²,
Francesca Arcelli Fontana³,
Andrea De Lucia¹,
Andy Zaidman⁴ &
…
Filomena Ferrucci¹

1340 Accesses
30 Citations
Explore all metrics

Abstract

Code smells are sub-optimal implementation choices applied by developers that have the effect of negatively impacting, among others, the change-proneness of the affected classes. Based on this consideration, in this paper we conjecture that code smell-related information can be effectively exploited to improve the performance of change prediction models, i.e., models having the goal of indicating which classes are more likely to change in the future. We exploit the so-called intensity index—a previously defined metric that captures the severity of a code smell—and evaluate its contribution when added as additional feature in the context of three state of the art change prediction models based on product, process, and developer-based features. We also compare the performance achieved by the proposed model with a model based on previously defined antipattern metrics, a set of indicators computed considering the history of code smells in files. Our results report that (i) the prediction performance of the intensity-including models is statistically better than the baselines and, (ii) the intensity is a better predictor than antipattern metrics. We observed some orthogonality between the set of change-prone and non-change-prone classes correctly classified by the models relying on intensity and antipattern metrics: for this reason, we also devise and evaluate a smell-aware combined change prediction model including product, process, developer-based, and smell-related features. We show that the F-Measure of this model is notably higher than other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward accurate detection on change barriers

Article 07 February 2021

Tingting Lv, Zhilei Ren, … He Jiang

On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation

Article Open access 07 August 2017

Fabio Palomba, Gabriele Bavota, … Andrea De Lucia

An empirical study on the effect of community smells on bug prediction

Article 15 February 2021

Beyza Eken, Francis Palma, … Tosun Ayşe

Notes

Up to date, the dataset is not available anymore on the repository.
http://cran.r-project.org/web/packages/car/index.html
https://github.com/klainfo/ScottKnottESD
https://bitergia.com
https://www.rdocumentation.org/packages/usdm/versions/1.1-18/topics/vif

References

Abbes M, Khomh F, Gueheneuc YG, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15th European conference on software maintenance and reengineering (CSMR). IEEE, pp 181–190
Abdi M, Lounis H, Sahraoui H (2006) Analyzing change impact in object-oriented systems. In: 32nd EUROMICRO conference on software engineering and advanced applications (EUROMICRO’06). IEEE, pp 310–319
Ammerlaan E, Veninga W, Zaidman A (2015) Old habits die hard: why refactoring for understandability does not give immediate benefits. In: Proceedings of the 22nd international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 504–507
Aniche M, Treude C, Zaidman A, van Deursen A, Gerosa M (2016) Satt: tailoring code metric thresholds for different software architectures. In: 2016 IEEE 16th IEEE international working conference on source code analysis and manipulation (SCAM), pp 41–50
Arcelli Fontana F, Ferme V, Zanoni M, Roveda R (2015) Towards a prioritization of code debt: a code smell intensity index. In: Proceedings of the seventh international workshop on managing technical debt (MTD 2015). IEEE, Bremen, Germany, pp 16–24, in conjunction with ICSME
Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191. https://doi.org/10.1007/s10664-015-9378-4
Article Google Scholar
Arisholm E, Briand LC, Foyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Trans Softw Eng 30(8):491–506
Article Google Scholar
Azeem MI, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Information and Software Technology
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 712–721
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley
Bansiya J, Davis CG (2002) A hierarchical model for object-oriented design quality assessment. IEEE Trans Softw Eng 28(1):4–17
Article Google Scholar
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Article Google Scholar
Bavota G, De Lucia A, Di Penta M, Oliveto R, Palomba F (2015) An experimental investigation on the innate relationship between quality and refactoring. J Syst Softw 107:1–14
Article Google Scholar
Bell RM, Ostrand TJ, Weyuker EJ (2013) The limited impact of individual developer data on software defect prediction. Empir Softw Eng 18(3):478–505
Article Google Scholar
Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories (MSR). IEEE, pp 202–211
Beller M, Bholanath R, McIntosh S, Zaidman A (2016) Analyzing the state of static analysis: a large-scale evaluation in open source software. In: Proceedings of the 23rd international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 470–481
Beller M, Gousios G, Zaidman A (2017a) Oops, my tests broke the build: an explorative analysis of travis ci with github. In: Proceedings of the international conference on mining software repositories (MSR). IEEE, pp 356–367
Beller M, Gousios G, Zaidman A (2017b) Travistorrent: synthesizing Travis CI and GitHub for full-stack research on continuous integration
Bieman JM, Straw G, Wang H, Munger PW, Alexander RT (2003) Design patterns and change proneness: an examination of five evolving systems. In: Proceedings international workshop on enterprise networking and computing in healthcare industry, pp 40–49. https://doi.org/10.1109/METRIC.2003.1232454
Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4 (6):888–900
Article Google Scholar
Boussaa M, Kessentini W, Kessentini M, Bechikh S, Ben Chikha S (2013) Competitive coevolutionary code-smells detection. In: Search based software engineering, lecture notes in computer science, vol 8084. Springer, Berlin , pp 50–65
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
Briand LC, Wust J, Lounis H (1999) Using coupling measurement for impact analysis in object-oriented systems. In: Proceedings of international conference on software maintenance (ICSM). IEEE, pp 475–482
Cataldo M, Mockus A, Roberts JA, Herbsleb JD (2009) Software dependencies, work dependencies, and their impact on failures. IEEE Trans Softw Eng 35(6):864–878
Article Google Scholar
Catolino G, Ferrucci F (2018) Ensemble techniques for software change prediction: a preliminary investigation. In: 2018 IEEE workshop on machine learning techniques for software quality evaluation (MaLTeSQuE). IEEE, pp 25–30
Catolino G, Ferrucci F (2019) An extensive evaluation of ensemble techniques for software change prediction. Journal of Software Evolution and Process. https://doi.org/10.1002/smr.2156
Catolino G, Palomba F, Arcelli Fontana F, De Lucia A, Ferrucci F, Zaidman A (2018a) Improving change prediction models with code smell-related information - replication package - https://figshare.com/s/f536bb37f3790914a32a
Catolino G, Palomba F, De Lucia A, Ferrucci F, Zaidman A (2018b) Enhancing change prediction models using developer-related factors. J Syst Softw 143:14–28
Article Google Scholar
D’Ambros M, Bacchelli A, Lanza M (2010) On the impact of design flaws on software defects. In: 2010 10th International conference on quality software (QSIC), pp 23–31. https://doi.org/10.1109/QSIC.2010.58
DAmbros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4):531–577
Article Google Scholar
Di Nucci D, Palomba F, De Rosa G, Bavota G, Oliveto R, De Lucia A (2018a) A developer centered bug prediction model. IEEE Trans Softw Eng 44(1):5–24
Article Google Scholar
Di Nucci D, Palomba F, Tamburri DA, Serebrenik A, De Lucia A (2018b) Detecting code smells using machine learning techniques: are we there yet? In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 612–621
Di Penta M, Cerulo L, Gueheneuc YG, Antoniol G (2008) An empirical study of the relationships between design pattern roles and class change proneness. In: Proceedings international conference on software maintenance (ICSM). IEEE, pp 217–226, DOI https://doi.org/10.1109/ICSM.2008.4658070, (to appear in print)
Eisenlohr PV (2014) Persisting challenges in multiple models: a note on commonly unnoticed issues regarding collinearity and spatial structure of ecological data. Brazilian J. Botany 37(3):365–371
Article Google Scholar
Elish MO, Al-Rahman Al-Khiaty M (2013) A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J Software: Evol Process 25(5):407–437
Google Scholar
Eski S, Buzluca F (2011) An empirical study on object-oriented metrics and software evolution in order to reduce testing costs by predicting change-prone classes. In: Proceedings international conf software testing, verification and validation workshops (ICSTW). IEEE, pp 566–571
Fluri B, Wuersch M, PInzger M, Gall H (2007) Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33:11
Article Google Scholar
Fontana FA, Zanoni M (2017) Code smell severity classification using machine learning techniques. Knowl-Based Syst 128:43–58
Article Google Scholar
Fontana FA, Zanoni M, Marino A, Mantyla MV (2013) Code smell detection: towards a machine learning-based approach. In: 2013 29th IEEE International conference on software maintenance (ICSM), pp 396–399. https://doi.org/10.1109/ICSM.2013.56
Fontana FA, Ferme V, Zanoni M, Roveda R (2015a) Towards a prioritization of code debt: a code smell intensity index. In: 2015 IEEE 7th international workshop on managing technical debt (MTD). IEEE, pp 16–24
Fontana FA, Ferme V, Zanoni M, Yamashita A (2015b) Automatic metric thresholds derivation for code smell detection. In: Proceedings of the sixth international workshop on emerging trends in software metrics. IEEE Press, pp 44–53
Fontana FA, Dietrich J, Walter B, Yamashita A, Zanoni M (2016) Antipattern and code smell false positives: preliminary conceptualization and classification. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, pp 609–613. https://doi.org/10.1109/SANER.2016.84
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley
Fregnan E, Baum T, Palomba F, Bacchelli A (2018) A survey on software coupling relations and tools. Information and Software Technology
Gatrell M, Counsell S (2015) The effect of refactoring on change and fault-proneness in commercial c# software. Sci Comput Program 102(0):44–56. https://doi.org/10.1016/j.scico.2014.12.002, http://www.sciencedirect.com/science/article/pii/S0167642314005711
Article Google Scholar
Girba T, Ducasse S, Lanza M (2004) Yesterday’s weather: guiding early reverse engineering efforts by summarizing the evolution of changes. In: Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, pp 40–49
Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach. Lawrence Erlbaum Associates Publishers
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2011) Developing fault-prediction models: what the research can show industry?. IEEE Softw 28(6):96–99
Article Google Scholar
Han AR, Jeon SU, Bae DH, Hong JE (2008) Behavioral dependency measurement for change-proneness prediction in uml 2.0 design models. In: 32nd Annual IEEE international computer software and applications conference. IEEE, pp 76–83
Han AR, Jeon SU, Bae DH, Hong JE (2010) Measuring behavioral dependency for improving change-proneness prediction in uml-based design models. J Syst Softw 83(2):222–234
Article Google Scholar
Hassan AE (2009) Predicting faults using the complexity of code changes. In: International conference software engineering (ICSE). IEEE, pp 78–88
Herbold S (2017) Comments on scottknottesd in response to ”an empirical comparison of model validation techniques for defect prediction models”. IEEE Trans Softw Eng 43 (11):1091–1094
Article Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Mateo, pp 338–345
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. ACM
Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: 2011 33rd International conference on software engineering (ICSE). IEEE, pp 351–360
Kennedy J (2011) Particle swarm optimization. In: Encyclopedia of machine learning. Springer, pp 760–766
Kessentini M, Vaucher S, Sahraoui H (2010) Deviance from perfection is a better criterion than closeness to evil when identifying risky code. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, ASE ’10, pp 113–122
Kessentini W, Kessentini M, Sahraoui H, Bechikh S, Ouni A (2014) A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans Softw Eng 40(9):841–861. https://doi.org/10.1109/TSE.2014.2331057
Article Google Scholar
Khomh F, Di Penta M, Gueheneuc YG (2009a) An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th Working conference on reverse engineering. IEEE, pp 75–84
Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H (2009b) A bayesian approach for the detection of code and design smells. In: Proceedings of the international conference on quality software (QSIC). IEEE, Hong Kong, pp 305–314
Khomh F, Di Penta M, Guéhéneuc YG, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empir Softw Eng 17(3):243–275
Article Google Scholar
Kim M, Zimmermann T, Nagappan N (2014) An empirical study of refactoringchallenges and benefits at microsoft. IEEE Trans Softw Eng 40(7):633–649
Article Google Scholar
Kohavi R (1995) The power of decision tables. In: 8th European conference on machine learning. Springer, pp 174–189
Kumar L, Behera RK, Rath S, Sureka A (2017a) Transfer learning for cross-project change-proneness prediction in object-oriented software systems: a feasibility analysis. ACM SIGSOFT Softw Eng Notes 42(3):1–11
Google Scholar
Kumar L, Rath SK, Sureka A (2017b) Empirical analysis on effectiveness of source code metrics for predicting change-proneness. In: ISEC, pp 4–14
Lanza M, Marinescu R (2006) Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer
Le Cessie S, Van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Statist, 191–201
Lehman MM, Belady LA (eds) (1985) Program evolution: processes of software change. Academic Press Professional, Inc, Cambridge
Lu H, Zhou Y, Xu B, Leung H, Chen L (2012) The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir Softw Eng 17(3):200–242
Article Google Scholar
Malhotra R, Bansal A (2015) Predicting change using software metrics: a review. In: International conference on reliability, infocom technologies and optimization (ICRITO). IEEE, pp 1–6
Malhotra R, Khanna M (2013) Investigation of relationship between object-oriented metrics and change proneness. Int J Mach Learn Cybern 4(4):273–286
Article Google Scholar
Malhotra R, Khanna M (2014) A new metric for predicting software change using gene expression programming. In: Proceedings international workshop on emerging trends in software metrics. ACM, pp 8–14
Malhotra R, Khanna M (2017) Software change prediction using voting particle swarm optimization based ensemble classifier. In: Proceedings of the genetic and evolutionary computation conference companion. ACM, pp 311–312
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Annals Math Statist, 50–60
Marinescu C (2014a) How good is genetic programming at predicting changes and defects? In: International symp. on symbolic and numeric algorithms for scientific computing (SYNASC). IEEE, pp 544–548
Marinescu R (2004b) Detection strategies: metrics-based rules for detecting design flaws. In: Proceedings of the international conference on software maintenance (ICSM), pp 350–359
Marinescu R (2012) Assessing technical debt by identifying design flaws in software systems. IBM J Res Dev 56(5):9–1
Article Google Scholar
Menzies T, Caglayan B, Kocaguneli E, Krall J, Peters F, Turhan B (2012) The PROMISE repository of empirical software engineering data
Mkaouer MW, Kessentini M, Bechikh S, Cinnéide MÓ (2014) A robust multi-objective approach for software refactoring under uncertainty. In: International symposium on search based software engineering. Springer, pp 168–183
Moha N, Guéhéneuc Y G, Duchien L, Meur AFL (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36 (1):20–36
Article MATH Google Scholar
Morales R, Soh Z, Khomh F, Antoniol G, Chicano F (2016) On the use of developers’ context for automatic refactoring of software anti-patterns. Journal of Systems and Software (JSS)
Munro MJ (2005) Product metrics for automatic identification of “bad smell” design problems in java source-code. In: Proceedings of the international software metrics symposium (METRICS). IEEE, p 15
Murphy-Hill E, Black AP (2010) An interactive ambient visualization for code smells. In: Proceedings of the 5th international symposium on software visualization. ACM, pp 5–14
O’brien RM (2007) A caution regarding rules of thumb for variance inflation factors. Quality & Quantity 41(5):673–690
Article Google Scholar
Olbrich SM, Cruzes DS, Sjøberg DIK (2010) Are all code smells harmful? A study of god classes and brain classes in the evolution of three open source systems. In: Internationl Conference on software maintenance, pp 1–10
Oliveto R, Khomh F, Antoniol G, Guéhéneuc YG (2010) Numerical signatures of antipatterns: an approach based on B-splines. In: Proceedings of the European conference on software maintenance and reengineering (CSMR). IEEE, pp 248–251
Palomba F, Zaidman A (2017) Does refactoring of test smells induce fixing flaky tests?. In: 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 1–12
Palomba F, Zaidman A (2019) The smell of fear: on the relation between test smells and flaky tests. Empirical software engineering (EMSE) to appear
Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: 2013 IEEE/ACM 28th international conference on automated software engineering (ASE). IEEE, pp 268–278
Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A (2014) Do they really smell bad? a study on developers’ perception of bad code smells. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110
Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2015a) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489. https://doi.org/10.1109/TSE.2014.2372760
Article Google Scholar
Palomba F, Lucia AD, Bavota G, Oliveto R (2015b) Anti-pattern detection: methods, challenges, and open issues. Adv Comput 95:201–238. https://doi.org/10.1016/B978-0-12-800160-8.00004-8
Article Google Scholar
Palomba F, Panichella A, De Lucia A, Oliveto R, Zaidman A (2016) A textual-based technique for smell detection. In: 2016 IEEE 24th International conference on program comprehension (ICPC). IEEE, pp 1–10
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2017a) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng, 1–34
Palomba F, Zanoni M, Fontana FA, De Lucia A, Oliveto R (2017b) Toward a smell-aware bug prediction model. IEEE Transactions on Software Engineering
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018a) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10
Article Google Scholar
Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2018b) The scent of a smell: an extensive comparison between textual and structural smells. Trans Softw Eng 44(10):977–1000
Article Google Scholar
Palomba F, Tamburri DAA, Fontana FA, Oliveto R, Zaidman A, Serebrenik A (2018c) Beyond technical aspects: how do community smells influence the intensity of code smells? IEEE Transactions on Software Engineering
Palomba F, Zaidman A, De Lucia A (2018d) Automatic test smell detection using information retrieval techniques. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 311–322
Parnas DL (1994) Software aging. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 279–287
Pascarella L, Spadini D, Palomba F, Bruntink M, Bacchelli A (2018) Information needs in contemporary code review. Proc ACM Human-Comput Interact 2(CSCW):135
Article Google Scholar
Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. Journal of Systems and Software to appear
Peer A, Malhotra R (2013) Application of adaptive neuro-fuzzy inference system for predicting software change proneness. In: 2013 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2026–2031
Peng CYJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14
Article Google Scholar
Peters R, Zaidman A (2012) Evaluating the lifespan of code smells using software repository mining. In: 2012 16th European conference on software maintenance and reengineering (CSMR). IEEE, pp 411–416
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1023/A:1022643204877
Article Google Scholar
Ratiu D, Ducasse S, Gîrba T, Marinescu R (2004) Using history information to improve design flaws detection. In: Proceedings of the European conference on software maintenance and reengineering (CSMR). IEEE, pp 223–232
Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: Proceedings international conference software maintenance (ICSM). IEEE, pp 303–312
Rosenblatt F (1961) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books
Rumbaugh J, Jacobson I, Booch G (2004) Unified modeling language reference manual, 2nd edn. Pearson Higher Education
Sahin D, Kessentini M, Bechikh S, Deb K (2014) Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol 24(1):6:1–6:44. https://doi.org/10.1145/2675067
Article Google Scholar
Schwartz J, Landrigan PJ, Feldman RG, Silbergeld EK, Baker EL, von Lindern IH (1988) Threshold effect in lead-induced peripheral neuropathy. J Pediatr 112(1):12–17
Article Google Scholar
Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics, 507–512
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
Article MathSciNet MATH Google Scholar
Sharafat AR, Tahvildari L (2007) A probabilistic approach to predict changes in object-oriented software systems. In: Proceedings conference on software maintenance and reengineering (CSMR). IEEE, pp 27–38
Sharafat AR, Tahvildari L (2008) Change prediction in object-oriented software systems: a probabilistic approach. J Softw 3(5):26–39
Article Google Scholar
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: aome comments on the nasa software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Article Google Scholar
Shepperd M, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616. https://doi.org/10.1109/TSE.2014.2322358
Article Google Scholar
Shihab E, Jiang ZM, Ibrahim WM, Adams B, Hassan AE (2010) Understanding the impact of code and process metrics on post-release defects: a case study on the eclipse project. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement. ACM, p 4
Sjoberg DI, Yamashita A, Anda BC, Mockus A, Dyba T (2013) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 8:1144–1156
Article Google Scholar
Soetens QD, Pérez J, Demeyer S, Zaidman A (2015) Circumventing refactoring masking using fine-grained change recording. In: Proceedings of the 14th international workshop on principles of software evolution (IWPSE). ACM, pp 9–18
Soetens QD, Demeyer S, Zaidman A, Pérez J (2016) Change-based test selection: an empirical evaluation. Empir Softw Eng 21(5):1990–2032
Article Google Scholar
Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality, IEEE
Spinellis D (2005) Tool writing: a forgotten art? IEEE Softw 4:9–11
Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Statist Soc Series B (Methodological), 111–147
Taba SES, Khomh F, Zou Y, Hassan AE, Nagappan M (2013) Predicting bugs using antipatterns. In: Proceedings of the 2013 IEEE international conference on software maintenance ICSM ’13. IEEE Computer Society, Washington, DC, pp 270–279, DOI https://doi.org/10.1109/ICSM.2013.38, (to appear in print)
Taibi D, Janes A, Lenarduzzi V (2017) How developers perceive smells in source code: a replicated study. Inf Softw Technol 92:223–235
Article Google Scholar
Tamburri DA, Palomba F, Serebrenik A, Zaidman A (2018) Discovering community patterns in open-source: a systematic approach and its evaluation. Empir Softw Eng, 1–49
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016a) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering ICSE ’16. ACM, New York, pp 321–332, DOI https://doi.org/10.1145/2884781.2884857
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016b) Comments on researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 42(11):1092–1094. https://doi.org/10.1109/TSE.2016.2553030
Article Google Scholar
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18. https://doi.org/10.1109/TSE.2016.2584050
Article Google Scholar
Tempero E, Anslow C, Dietrich J, Han T, Li J, Lumpe M, Melton H, Noble J (2010) The qualitas corpus: a curated collection of java code for empirical studies. In: Proceedings of 17th Asia Pacific software engineering conference. IEEE, Sydney, pp 336–345, DOI https://doi.org/10.1109/APSEC.2010.46, (to appear in print)
Theodoridis S, Koutroumbas K (2008) Pattern recognition. IEEE Trans Neural Netw 19(2):376–376
Article MATH Google Scholar
Tsantalis N, Chatzigeorgiou A (2009) Identification of move method refactoring opportunities. IEEE Trans Softw Eng 35(3):347–367
Article Google Scholar
Tsantalis N, Chatzigeorgiou A, Stephanides G (2005) Predicting the probability of change in object-oriented systems. IEEE Trans Softw Eng 31(7):601–614
Article Google Scholar
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2015) When and why your code starts to smell bad. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 403–414
Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2016) An empirical investigation into the nature of test smells. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 4–15
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Article Google Scholar
Vassallo C, Palomba F, Gall HC (2018a) Continuous refactoring in ci: a preliminary study on the perceived advantages and barriers. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 564–568
Vassallo C, Panichella S, Palomba F, Proksch S, Zaidman A, Gall HC (2018b) Context is king: the developer perspective on the usage of static analysis tools. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 38–49
Vidal S, Guimaraes E, Oizumi W, Garcia A, Pace AD, Marcos C (2016a) On the criteria for prioritizing code anomalies to identify architectural problems. In: Proceedings of the 31st annual ACM symposium on applied computing. ACM, pp 1812–1814
Vidal SA, Marcos C, Díaz-Pace JA (2016b) An approach to prioritize code smells for refactoring. Autom Softw Eng 23(3):501–532
Article Google Scholar
Vonken F, Zaidman A (2012) Refactoring with unit testing: a match made in heaven? In: Proceeedings of the working conference on reverse engineering (WCRE). IEEE, pp 29–38
Y Freund LM (1999) The alternating decision tree learning algorithm. In: Proceeding of the sixteenth international conference on machine learning, pp 124–133
Yamashita AF, Moonen L (2012) Do code smells reflect important maintainability aspects?. In: Proceedings of the international conference on software maintenance (ICSM). IEEE, pp 306–315
Yamashita A, Moonen L (2013) Exploring the impact of inter-smell relations on software maintainability: an empirical study. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 682–691
Yu CH (2000) An overview of remedial tools for collinearity in sas. In: Proceedings of 2000 western users of SAS software conference, WUSS, vol 1, pp 196–201
Zhao L, Hayes JH (2011) Rank-based refactoring decision support: two studies. Innov Syst Softw Eng 7(3):171
Article Google Scholar
Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Trans Softw Eng 35(5):607–623
Article Google Scholar
Zogaan W, Sharma P, Mirahkorli M, Arnaoudova V (2017) Datasets from fifteen years of automated requirements traceability research: Current state, characteristics, and quality. In: 2017 IEEE 25th international requirements engineering conference (RE). IEEE, pp 110–121

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for the detailed and constructive comments on the preliminary version of this paper, which were instrumental to improving the quality of the work. Fabio Palomba was partially supported by the Swiss National Science Foundation (SNSF) through the Project no. PP00P2_170529.

Author information

Authors and Affiliations

University of Salerno, Fisciano, Italy
Gemma Catolino, Andrea De Lucia & Filomena Ferrucci
University of Zurich, Zurich, Switzerland
Fabio Palomba
University of Milano-Bicocca, Milan, Italy
Francesca Arcelli Fontana
Delft University of Technology, Delft, The Netherlands
Andy Zaidman

Authors

Gemma Catolino
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Palomba
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Arcelli Fontana
View author publications
You can also search for this author in PubMed Google Scholar
Andrea De Lucia
View author publications
You can also search for this author in PubMed Google Scholar
Andy Zaidman
View author publications
You can also search for this author in PubMed Google Scholar
Filomena Ferrucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gemma Catolino.

Additional information

Communicated by: Kelly Blincoe

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Catolino, G., Palomba, F., Fontana, F.A. et al. Improving change prediction models with code smell-related information. Empir Software Eng 25, 49–95 (2020). https://doi.org/10.1007/s10664-019-09739-0

Download citation

Published: 02 August 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10664-019-09739-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving change prediction models with code smell-related information

Abstract

Access this article

Similar content being viewed by others

Toward accurate detection on change barriers

On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation

An empirical study on the effect of community smells on bug prediction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving change prediction models with code smell-related information

Abstract

Access this article

Similar content being viewed by others

Toward accurate detection on change barriers

On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation

An empirical study on the effect of community smells on bug prediction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation