skip to main content
10.1145/3631991.3631996acmotherconferencesArticle/Chapter ViewAbstractPublication PageswsseConference Proceedingsconference-collections
research-article

A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest

Published:26 December 2023Publication History

ABSTRACT

Many fault-prone module prediction methods are implemented using machine learning algorithms, and the random forest is well known as the simple and powerful one. However, since the random forest uses an ensemble of decision trees, it is hard to explain why the module is predicted as “fault-prone.” In order to compensate for such a weakness, there have been studies of hybrid prediction methods combining the association rule mining technique with the random forest. In the hybrid method, a module’s fault-proneness is first assessed by the association rules. Then, when the module’s feature does not match any rules, its fault-proneness is evaluated by the random forest model. This paper focuses on how to combine the two techniques and conducts a comparative study to explore a better hybrid prediction method. The empirical results show: (1) it is better to use both association rules of “faulty” and “non-faulty” rather than using only “faulty” rules; (2) it is better to train the random forest classifiers using all data regardless of whether or not they matched association rules.

References

  1. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc. ACM SIGMOD Int. Conf. Management of Data (Washington, D.C., USA). Association for Computing Machinery, New York, NY, USA, 207–216. https://doi.org/10.1145/170035.170072Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. 20th Int. Conf. Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 487–499.Google ScholarGoogle Scholar
  3. H. Aljamaan and Alazba. 2020. Software defect predictionusing tree-based ensembles. In Proc. 16th ACM Int. Conf. Predictive Models and Data Analytics in Softw. Eng. (Virtual, USA). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3416508.3417114Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bansiya and C.G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE Trans. Softw. Eng. 28, 1 (Jan. 2002), 4–17. https://doi.org/10.1109/32.979986Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20, 6 (June 1994), 476–493. https://doi.org/10.1109/32.295895Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Softw. Eng. 38, 6 (Nov. 2012), 1276–1304. https://doi.org/10.1109/TSE.2011.103Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brian Henderson-Sellers. 1995. Object-oriented Metrics: Measures of Complexity. Prentice Hall, Hoboken, New Jersey.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Capers Jones. 2008. Applied Software Measurement: Global Analysis of Productivity and Quality (3rd ed.). McGraw-Hill, New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yasutaka Kamei, Akito Monden, Shuji Morisaki, and Ken-ichi Matsumoto. 2008. A Hybrid Faulty Module Prediction Using Association Rule Mining and Logistic Regression Analysis. In Proc. 2nd ACM-IEEE Int. Symp. Empir. Softw. Eng. & Measurement (Kaiserslautern, Germany). Association for Computing Machinery, New York, NY, USA, 279–281. https://doi.org/10.1145/1414004.1414051Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yasutaka Kamei, Shuji Morisaki, Akito Monden, and Ken ichi Matsumoto. 2008. A Faulty Module Detection Method Combining Association Rule Mining and Logistic Regression Analysis. IPSJ Journal 49, 12 (Dec. 2008), 3954–3966. http://id.nii.ac.jp/1001/00009364/ (in Japanese).Google ScholarGoogle Scholar
  12. Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (June 2013), 757–773. https://doi.org/10.1109/TSE.2012.70Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Lessmann, B. Baesens, C. Mues, and S Pietsch. 2008. Benchmarking classification models for software defectprediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng. 34, 4 (July 2008), 485–496. https://doi.org/10.1109/TSE.2008.35Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Paul Luo Li, James Herbsleb, Mary Shaw, and Brian Robinson. 2006. Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.. In Proc. 28th Int. Conf. Softw. Eng. (Shanghai, China). Association for Computing Machinery, New York, NY, USA, 413–422. https://doi.org/10.1145/1134285.1134343Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (Feb. 2015), 504–518. https://doi.org/10.1016/j.asoc.2014.11.023Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Robert C. Martin. 1994. OO Design Quality Metrics. In Proc. Workshop Pragmatic & Theoretical Directions in Object-Oriented Softw. Metrics (Portland, Oregon, USA). 1–8. (a workshop of OOPSLA’94).Google ScholarGoogle Scholar
  17. T.J. McCabe. 1976. A Complexity Measure. IEEE Trans. Softw. Eng. SE-2, 4 (Dec. 1976), 308–320. https://doi.org/10.1109/TSE.1976.233837Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Akito Monden, Jacky Keung, Shuji Morisaki, Yasutaka Kamei, and Ken-Ichi Matsumoto. 2012. A Heuristic Rule Reduction Approach to Software Fault-proneness Prediction. In Proc. 19th Asia-Pacific Softw. Eng. Conf. (Hong Kong, China), Vol. 1. IEEE, USA, 838–847. https://doi.org/10.1109/APSEC.2012.103Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. R. Quinlan. 1986. Induction of decision trees. Machine Learning 1 (March 1986), 81–106. https://doi.org/10.1007/BF00116251Google ScholarGoogle ScholarCross RefCross Ref
  20. Danijel Radjenović, Marjan Heričko, Richard Torkar, and Aleš Živkovič. 2013. Software fault prediction metrics: A systematic literature review. Information and Softw. Tech. 55, 8 (Aug. 2013), 1397–1418. https://doi.org/10.1016/j.infsof.2013.02.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Toshiki Seto, Kinari Nishiura, and Akito Monden. 2022. A Replicated Study of Hybrid Fault-prone Module Prediction with Association Rule Analysis and Random Forest. IPSJ Journal 63, 8 (Aug. 2022), 1352–1360. https://doi.org/10.20729/00218983 (in Japanese).Google ScholarGoogle ScholarCross RefCross Ref
  22. Mei-Huei Tang, Ming-Hung Kao, and Mei-Hwa Chen. 1999. An empirical study on object-oriented metrics. In Proc. 6th Int. Softw. Metrics Symp. (Boca Raton, FL, USA). IEEE, USA, 242–249. https://doi.org/10.1109/METRIC.1999.809745Google ScholarGoogle ScholarCross RefCross Ref
  23. F. Zhang, Q. Zheng, Y. Zou, and A.E. Hassan. 2016. Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier. In Proc. 38th Int. Conf. Softw. Eng. (Austin, Texas). Association for Computing Machinery, New York, NY, USA, 309–320. https://doi.org/10.1145/2884781.2884839Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering
          September 2023
          352 pages
          ISBN:9798400708053
          DOI:10.1145/3631991

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 December 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)13
          • Downloads (Last 6 weeks)4

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format