ABSTRACT
Many fault-prone module prediction methods are implemented using machine learning algorithms, and the random forest is well known as the simple and powerful one. However, since the random forest uses an ensemble of decision trees, it is hard to explain why the module is predicted as “fault-prone.” In order to compensate for such a weakness, there have been studies of hybrid prediction methods combining the association rule mining technique with the random forest. In the hybrid method, a module’s fault-proneness is first assessed by the association rules. Then, when the module’s feature does not match any rules, its fault-proneness is evaluated by the random forest model. This paper focuses on how to combine the two techniques and conducts a comparative study to explore a better hybrid prediction method. The empirical results show: (1) it is better to use both association rules of “faulty” and “non-faulty” rather than using only “faulty” rules; (2) it is better to train the random forest classifiers using all data regardless of whether or not they matched association rules.
- Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc. ACM SIGMOD Int. Conf. Management of Data (Washington, D.C., USA). Association for Computing Machinery, New York, NY, USA, 207–216. https://doi.org/10.1145/170035.170072Google ScholarDigital Library
- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. 20th Int. Conf. Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 487–499.Google Scholar
- H. Aljamaan and Alazba. 2020. Software defect predictionusing tree-based ensembles. In Proc. 16th ACM Int. Conf. Predictive Models and Data Analytics in Softw. Eng. (Virtual, USA). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3416508.3417114Google ScholarDigital Library
- J. Bansiya and C.G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE Trans. Softw. Eng. 28, 1 (Jan. 2002), 4–17. https://doi.org/10.1109/32.979986Google ScholarDigital Library
- Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324Google ScholarDigital Library
- S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20, 6 (June 1994), 476–493. https://doi.org/10.1109/32.295895Google ScholarDigital Library
- Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Softw. Eng. 38, 6 (Nov. 2012), 1276–1304. https://doi.org/10.1109/TSE.2011.103Google ScholarDigital Library
- Brian Henderson-Sellers. 1995. Object-oriented Metrics: Measures of Complexity. Prentice Hall, Hoboken, New Jersey.Google ScholarDigital Library
- Capers Jones. 2008. Applied Software Measurement: Global Analysis of Productivity and Quality (3rd ed.). McGraw-Hill, New York.Google ScholarDigital Library
- Yasutaka Kamei, Akito Monden, Shuji Morisaki, and Ken-ichi Matsumoto. 2008. A Hybrid Faulty Module Prediction Using Association Rule Mining and Logistic Regression Analysis. In Proc. 2nd ACM-IEEE Int. Symp. Empir. Softw. Eng. & Measurement (Kaiserslautern, Germany). Association for Computing Machinery, New York, NY, USA, 279–281. https://doi.org/10.1145/1414004.1414051Google ScholarDigital Library
- Yasutaka Kamei, Shuji Morisaki, Akito Monden, and Ken ichi Matsumoto. 2008. A Faulty Module Detection Method Combining Association Rule Mining and Logistic Regression Analysis. IPSJ Journal 49, 12 (Dec. 2008), 3954–3966. http://id.nii.ac.jp/1001/00009364/ (in Japanese).Google Scholar
- Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (June 2013), 757–773. https://doi.org/10.1109/TSE.2012.70Google ScholarDigital Library
- S. Lessmann, B. Baesens, C. Mues, and S Pietsch. 2008. Benchmarking classification models for software defectprediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng. 34, 4 (July 2008), 485–496. https://doi.org/10.1109/TSE.2008.35Google ScholarDigital Library
- Paul Luo Li, James Herbsleb, Mary Shaw, and Brian Robinson. 2006. Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.. In Proc. 28th Int. Conf. Softw. Eng. (Shanghai, China). Association for Computing Machinery, New York, NY, USA, 413–422. https://doi.org/10.1145/1134285.1134343Google ScholarDigital Library
- Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (Feb. 2015), 504–518. https://doi.org/10.1016/j.asoc.2014.11.023Google ScholarDigital Library
- Robert C. Martin. 1994. OO Design Quality Metrics. In Proc. Workshop Pragmatic & Theoretical Directions in Object-Oriented Softw. Metrics (Portland, Oregon, USA). 1–8. (a workshop of OOPSLA’94).Google Scholar
- T.J. McCabe. 1976. A Complexity Measure. IEEE Trans. Softw. Eng. SE-2, 4 (Dec. 1976), 308–320. https://doi.org/10.1109/TSE.1976.233837Google ScholarDigital Library
- Akito Monden, Jacky Keung, Shuji Morisaki, Yasutaka Kamei, and Ken-Ichi Matsumoto. 2012. A Heuristic Rule Reduction Approach to Software Fault-proneness Prediction. In Proc. 19th Asia-Pacific Softw. Eng. Conf. (Hong Kong, China), Vol. 1. IEEE, USA, 838–847. https://doi.org/10.1109/APSEC.2012.103Google ScholarDigital Library
- J. R. Quinlan. 1986. Induction of decision trees. Machine Learning 1 (March 1986), 81–106. https://doi.org/10.1007/BF00116251Google ScholarCross Ref
- Danijel Radjenović, Marjan Heričko, Richard Torkar, and Aleš Živkovič. 2013. Software fault prediction metrics: A systematic literature review. Information and Softw. Tech. 55, 8 (Aug. 2013), 1397–1418. https://doi.org/10.1016/j.infsof.2013.02.009Google ScholarDigital Library
- Toshiki Seto, Kinari Nishiura, and Akito Monden. 2022. A Replicated Study of Hybrid Fault-prone Module Prediction with Association Rule Analysis and Random Forest. IPSJ Journal 63, 8 (Aug. 2022), 1352–1360. https://doi.org/10.20729/00218983 (in Japanese).Google ScholarCross Ref
- Mei-Huei Tang, Ming-Hung Kao, and Mei-Hwa Chen. 1999. An empirical study on object-oriented metrics. In Proc. 6th Int. Softw. Metrics Symp. (Boca Raton, FL, USA). IEEE, USA, 242–249. https://doi.org/10.1109/METRIC.1999.809745Google ScholarCross Ref
- F. Zhang, Q. Zheng, Y. Zou, and A.E. Hassan. 2016. Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier. In Proc. 38th Int. Conf. Softw. Eng. (Austin, Texas). Association for Computing Machinery, New York, NY, USA, 309–320. https://doi.org/10.1145/2884781.2884839Google ScholarDigital Library
Index Terms
- A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest
Recommendations
A hybrid faulty module prediction using association rule mining and logistic regression analysis
ESEM '08: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurementThis paper proposes a fault-prone module prediction method that combines association rule mining with logistic regression analysis. In the proposed method, we focus on three key measures of interestingness of an association rule (support, confidence and ...
Association rule and quantitative association rule mining among infrequent items
MDM '07: Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)Association rule mining among frequent items has been extensively studied in data mining research. However, in the recent years, there is an increasing demand of mining the infrequent items (such as rare but expensive items). Since exploring interesting ...
Application of Random Forest in Predicting Fault-Prone Classes
ICACTE '08: Proceedings of the 2008 International Conference on Advanced Computer Theory and EngineeringThere are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code ...
Comments