Abstract
Issue-tracking systems (e.g. JIRA) have increasingly been used in many software projects. An issue could represent a software bug, a new requirement or a user story, or even a project task. A deadline can be imposed on an issue by either explicitly assigning a due date to it, or implicitly assigning it to a release and having it inherit the release’s deadline. This paper presents a novel approach to providing automated support for project managers and other decision makers in predicting whether an issue is at risk of being delayed against its deadline. A set of features (hereafter called risk factors) characterizing delayed issues were extracted from eight open source projects: Apache, Duraspace, Java.net, JBoss, JIRA, Moodle, Mulesoft, and WSO2. Risk factors with good discriminative power were selected to build predictive models to predict if the resolution of an issue will be at risk of being delayed. Our predictive models are able to predict both the the extend of the delay and the likelihood of the delay occurrence. The evaluation results demonstrate the effectiveness of our predictive models, achieving on average 79 % precision, 61 % recall, 68 % F-measure, and 83 % Area Under the ROC Curve. Our predictive models also have low error rates: on average 0.66 for Macro-averaged Mean Cost-Error and 0.72 Macro-averaged Mean Absolute Error.
Similar content being viewed by others
Notes
Here we deal with only 4 classes but the formula can be easily generalized to n classes.
References
Abdelmoez W, Kholief M, Elsalmy FM (2012) Bug Fix-Time Prediction Model Using Naïve Bayes Classifier. In: Proceedings of the 22nd International Conference on Computer Theory and Applications (ICCTA), October, pp 13–15
Anvik J, Murphy GC (2011) Reducing the effort of bug report triage. ACM Trans Softw Eng Methodol 20(3):1–35
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. ACM Press, New York, USA
Baccianella S, Esuli A, Sebastiani F (2009) Evaluation measures for ordinal regression. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 283–287
Belsley DA, Kuh E, Welsch RE (2005) Regression diagnostics: Identifying influential data and sources of collinearity, vol 571. John Wiley & Sons
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM Press, New York, USA, pp 308–318
Bettenburg N, Premraj R, Zimmermann T (2008b) Duplicate bug reports considered harmful … really?. In: Proceedings of the International Conference on Software Maintenance (ICSM), pp 337–345
Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models: can we do better?. In: Proceedings of the 8th working conference on Mining software repositories (MSR). ACM, pp 207–210
Blei DM, Ng AY, Jordan MI (2012) Latent Dirichlet Allocation. J Mach Learn Res 3(4-5):993–1022
Boehm B (1989) Software risk management. Springer
Boehm B (1991) Software risk management: principles and practices. Software, IEEE 8(1):32–41
Breiman L (2001) Random forests. Machine learning pp 5–32
Bright P (2015) What windows as a service and a ’free upgrade’ mean at home and at work. https://goo.gl/Fzwflg
Chawla N, Cieslak D (2006) Evaluating probability estimates from decision trees. American Association for Artificial Intelligence (AAAI) pp 1–6
Choetkiertikul M, Dam HK, Tran T, Ghose A (2015a) Characterization and prediction of issue-related risks in software projects. In: Proceedings of the 12th Working Conference on Mining Software Repositories (MSR). IEEE, pp 280–291
Choetkiertikul M, Dam HK, Tran T, Ghose A (2015b) Predicting delays in software projects using networked classification. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 353 – 364
Conforti R, de Leoni M, La Rosa M, van der Aalst WM, ter Hofstede AH (2015) A recommendation system for predicting risks across multiple business process instances. Decis Support Syst 69:1–19
da Costa DA, Abebe SL, Mcintosh S, Kulesza U, Hassan AE (2014) An Empirical Study of Delays in the Integration of Addressed Issues. In: Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), pp 281–290
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38 (4):367–378
Garg A, Roth D (2001) Understanding Probabilistic Classifiers, Lecture Notes in Computer Science 2167
Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recogn Lett 31(14):2225–2236
Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering (RSSE). ACM, pp 52–56
Group S (2004) Chaos report. Tech. rep. West Yarmouth. Standish Group, Massachusetts
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: Proceedings of the 32nd International Conference on Software Engineering (ICSE), vol 1, pp 495–504
Guyon I, Elisseeff A (2003) An Introduction to Variable and Feature Selection. J Mach Learn Res 3:1157–1182
Han WM, Huang SJ (2007) An empirical analysis of risk components and performance on software projects. J Syst Softw 80(1):42–50
Hodge VJ, Austin J (2004) A Survey of Outlier Detection Methodoligies. Artif Intell Rev 22(1969):85–126
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the 22 IEEE/ACM international conference on Automated software engineering (ASE), ACM Press, pp 34–44
Hu Y, Huang J, Chen J, Liu M, Xie K, Yat-sen S (2007) Software Project Risk Management Modeling with Neural Network and Support Vector Machine Approaches. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC), vol 3, pp 358–362
Hu Y, Zhang X, Ngai E, Cai R, Liu M (2013) Software project risk analysis using Bayesian networks with causality constraints. Decis Support Syst 56:439–449
Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion?. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR), pp 181–190
Iqbal A (2014) Understanding Contributor to Developer Turnover Patterns in OSS Projects : A Case Study of Apache Projects. ISRN Softw Eng 2014:10–20
Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the International Conference on Dependable Systems and Networks With FTCS and DCC, DSN. IEEE, pp 52–61
Jr DH, Lemeshow S (2004) Applied logistic regression, 3rd edn. Wiley
Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: Proceedings of the IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 1–10
Kaufman S, Perlich C (2012) Leakage in Data Mining : Formulation , Detection , and Avoidance. ACM Trans Knowl Discov Data (TKDD) 6(15):556–563
Kim S, Zimmermann T, Pan K, Jr Whitehead E (2006) Automatic Identification of Bug-Introducing Changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 81–90
Kochhar PS, Thung F, Lo D (2014) Automatic fine-grained issue report reclassification. In: Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS, pp 126–135
Kohavi R (1996) Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp 202–207
Lam X, Vu T, Le T (2008) Addressing cold-start problem in recommendation systems. In: Proceedings of the 2nd international conference on Ubiquitous information management and communication, pp 208–211
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bugs. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, pp 1–10
Lee SI, Lee H, Abbeel P, Ng AY (2006) Efficient l∼ 1 regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, vol 21, pp 401–409
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans Softw Eng 34(4):485–496
Letier E, Stefan D, Barr ET (2014) Uncertainty, risk, and information value in software requirements and architecture. In: Proceedings of the 36th International Conference on Software Engineering (ICSE). ACM Press, New York, USA, pp 883–894
Marks L, Zou Y, Hassan AE (2011) Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Promise). ACM Press, pp 1–8
Menard S (2002) Applied logistic regression analysis, vol 106, 2nd edn. SAGE University paper
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, pp 346–355
Michael B, Blumberg S, Laartz J (2012) Delivering large-scale IT projects on time, on budget, and on value. Tech. rep
Murphy G, Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th International Conference on Software Engineering & Knowledge Engineering (SEKE), pp 92– 97
Neumann D (2002) An enhanced neural network technique for software risk analysis. IEEE Trans Softw Eng 28(9):904–912
Panjer LD (2007) Predicting Eclipse Bug Lifetimes. In: Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), pp 29–32
Pika A, van der Aalst WM, Fidge CJ, ter Hofstede AH, Wynn MT, Aalst WVD (2013) Profiling event logs to configure risk indicators for process delays. In: Proceedings of the 25th International Conference on Advanced Information Systems Engineering (CAiSE). Springer, pp 465–481
Porter AA, Siy HP, Votta LG (1997) Understanding the effects of developer activities on inspection interval. ACM Press
Qin X, Salter-Townshend M, Cunningham P (2014) Exploring the Relationship between Membership Turnover and Productivity in Online Communities. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc
Rahman MM, Ruhe G, Zimmermann T (2009) Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects. In: Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE, pp 39–442
Runeson P, Alexandersson M, Nyholm O (2007) Detection of Duplicate Defect Reports Using Natural Language Processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE, pp 499–510
Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2012) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE, ASE, pp 253–262
Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 205–214
Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383
Valdivia Garcia H, Shihab E, Garcia HV (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM Press, pp 72–81
Wallace L, Keil M (2004) Software project risks and their effect on outcomes. Commun ACM 47(4):68–73
Wang LM, Li XL, Cao CH, Yuan SM (2006) Combining decision tree and Naive Bayes for classification. Knowl-Based Syst 19(7):511–515
Wang Q, Zhu J, Yu B (2005) Combining Classifiers in Software Quality Prediction : A Neural Network Approach. In: Proceedings of the 2nd International Symposium on Neural Networks. Springer Berlin, Heidelberg, pp 921–926
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE), pp 461–470
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How Long Will It Take to Fix This Bug?. In: Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), pp 1–8
Wolfson J, Bandyopadhyay S, Elidrisi M, Vazquez-Benitez G, Musgrove D, Adomavicius G, Johnson P, O’Connor P (2014) A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat Med:21–42
Xia X, Lo D, Shihab E, Wang X, Zhou B (2014a) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75–109
Xia X, Lo D, Wen M, Shihab E, Zhou B (2014b) An empirical study of bug report field reassignment. In: Proceedings of the Conference on Software Maintenance, Reengineering, and Reverse Engineering, pp 174–183
Xia X, Lo D, Shihab E, Wang X, Yang X (2015) ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Inf Softw Technol 61:93–106
Xu R, leqiu Q, Xinhai J (2003) CMM-based software risk control optimization. In: Proceedings of the 5th IEEE Workshop on Mobile Computing Systems and Applications, IEEE, pp 499–503
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 694–699
Zanoni M, Perin F, Fontana FA, Viscusi G (2014) Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process 26(12):1172–1192
Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proceedings of the 34th International Conference on Software Engineering (ICSE). IEEE Press, pp 1074–1083
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei
Rights and permissions
About this article
Cite this article
Choetkiertikul, M., Dam, H.K., Tran, T. et al. Predicting the delay of issues with due dates in software projects. Empir Software Eng 22, 1223–1263 (2017). https://doi.org/10.1007/s10664-016-9496-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9496-7