ABSTRACT
BACKGROUND: Just-In-Time (JIT) models, unlike the traditional defect prediction models, detect the fix-inducing changes (or defect inducing changes). These models are designed based on the assumption that past code change properties are similar to future ones. However, as the system evolves, the expertise of developers and/or the complexity of the system also change.
AIM: In this work, we aim to investigate the effect of code change properties on JIT models over time. We also study the impact of using recent data as well as all available data on the performance of JIT models. Further, we analyze the effect of weighted sampling on the performance of fix-inducing properties of JIT models. For this purpose, we used datasets from four open-source projects, namely Eclipse JDT, Mozilla, Eclipse Platform, and PostgreSQL.
METHOD: We used five families of change code properties such as size, diffusion, history, experience, and purpose. We used Random Forest to train and test the JIT model and Brier Score (BS) and Area Under Curve (AUC) for performance measurement. We applied the Wilcoxon Signed Rank Test on the output to statistically validate whether the performance of JIT models improves using all the available data or the recent data.
RESULTS: Our paper suggest that the predictive power of JIT models does not change by time. Furthermore, we observed that the chronology of data in JIT defect prediction models can be discarded by considering all the available data. On the other hand, the importance score of families of code change properties is found to oscillate over time.
CONCLUSION: To mitigate the impact of the evolution of code change properties, it is recommended to use weighted sampling approach in which more emphasis is placed upon the changes occurring closer to the current time. Moreover, since properties such as "Expertise of the Developer" and "Size" evolve with the time, the models obtained from old data may exhibit different characteristics compared to those employing the newer dataset. Hence, practitioners should constantly retrain JIT models to include fresh data.
- Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (Oct. 2001), 5--32. Google ScholarDigital Library
- M. D'Ambros, M. Lanza, and R. Robbes. 2010. An extensive comparison of bug prediction approaches. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 31--41.Google ScholarCross Ref
- Francis X. Diebold and Roberto S. Mariano. 1995. Comparing Predictive Accuracy. Journal of Business & Economic Statistics 13, 3 (1995), 253--263.Google Scholar
- Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C. Gall. 2012. Method-level Bug Prediction. In Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '12). ACM, New York, NY, USA, 171--180. Google ScholarDigital Library
- Tilmann Gneiting and Adrian E Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Amer. Statist. Assoc. 102, 477 (2007), 359--378.Google ScholarCross Ref
- T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. 2000. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering 26, 7 (July 2000), 653--661. Google ScholarDigital Library
- Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan, and Brendan Murphy. 2010. Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10). ACM, New York, NY, USA, 495--504. Google ScholarDigital Library
- David J. Hand. 2009. Measuring Classifier Performance: A Coherent Alternative to the Area Under the ROC Curve. Mach. Learn. 77, 1 (Oct. 2009), 103--123. Google ScholarDigital Library
- A. E. Hassan. 2009. Predicting faults using the complexity of code changes. In 2009 IEEE 31st International Conference on Software Engineering. 78--88. Google ScholarDigital Library
- Hideaki Hata, Osamu Mizuno, and Tohru Kikuno. 2012. Bug Prediction Based on Fine-grained Module Histories. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 200--210. http://dl.acm.org/citation.cfm?id=2337223.2337247 Google ScholarDigital Library
- Malley JD, Kruppa J, Dasgupta A, Malley KG, and Ziegler A. 2012. Probability machines: consistent probability estimation using nonparametric learning machines. Methods of Information in Medicine 51, 1 (2012), 74--81.Google ScholarCross Ref
- Yasutaka Kamei, Takafumi Fukushima, Shane Mcintosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying Just-in-time Defect Prediction Using Cross-project Models. Empirical Softw. Engg. 21, 5 (Oct. 2016), 2072--2106. Google ScholarDigital Library
- Y. Kamei, S. Matsumoto, A. Monden, K. Matsumoto, B. Adams, and A. E. Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In 2010 IEEE International Conference on Software Maintenance. 1--10. Google ScholarDigital Library
- Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N.Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (June 2013), 757--773. Google ScholarDigital Library
- S. Kim, E. J. Whitehead, Jr., and Y. Zhang. 2008. Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering 34, 2 (March 2008), 181--196. Google ScholarDigital Library
- A. G. Koru, D. Zhang, K. El Emam, and H. Liu. 2009. An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules. IEEE Transactions on Software Engineering 35, 2 (March 2009), 293--304. Google ScholarDigital Library
- Paul Luo Li, James Herbsleb, Mary Shaw, and Brian Robinson. 2006. Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.. In Proceedings of the 28th International Conference on Software Engineering (ICSE '06). ACM, New York, NY, USA, 413--422. Google ScholarDigital Library
- Shinsuke Matsumoto, Yasutaka Kamei, Akito Monden, Ken-ichi Matsumoto, and Masahide Nakamura. 2010. An Analysis of Developer Metrics for Fault Prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering (PROMISE '10). ACM, New York, NY, USA, Article 18, 9 pages. Google ScholarDigital Library
- S. McIntosh and Y. Kamei. 2018. Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction. IEEE Transactions on Software Engineering 44, 5 (May 2018), 412--428.Google ScholarCross Ref
- A. Mockus and D. M. Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal 5, 2 (April 2000), 169--180.Google Scholar
- Nachiappan Nagappan and Thomas Ball. 2005. Use of Relative Code Churn Measures to Predict System Defect Density. In Proceedings of the 27th International Conference on Software Engineering (ICSE '05). ACM, New York, NY, USA, 284--292. Google ScholarDigital Library
- Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering (ICSE '06). ACM, New York, NY, USA, 452--461. Google ScholarDigital Library
- Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample Size vs. Bias in Defect Prediction. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 147--157. Google ScholarDigital Library
- Gema Rodríguez-Pérez, Gregorio Robles, and Jesús M. González-Barahona. 2018. Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the SZZ algorithm. Information and Software Technology 99 (2018), 164--176.Google ScholarDigital Library
- Shaoqing Ren, X. Cao, Yichen Wei, and J. Sun. 2015. Global refinement of random forest. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 723--730.Google Scholar
- Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. 2007. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8, 25 (Oct. 2007), 1--25.Google ScholarCross Ref
- Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting Defects for Eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering (PROMISE '07). IEEE Computer Society, Washington, DC, USA, 9--. Google ScholarDigital Library
Recommendations
Deep just-in-time defect prediction: how far are we?
ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and AnalysisDefect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has ...
Studying just-in-time defect prediction using cross-project models
Unlike traditional defect prediction models that identify defect-prone modules, Just-In-Time (JIT) defect prediction models identify defect-inducing changes. As such, JIT defect models can provide earlier feedback for developers, while design decisions ...
Cross-project smell-based defect prediction
AbstractDefect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of ...
Comments