research-article

A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest

Authors:
Shinnosuke Irie

Department of Computer Science, Ehime University, Japan

Department of Computer Science, Ehime University, Japan

0009-0004-7657-2424
View Profile

,
Hirohisa Aman

Center for Information Technoloty, Ehime University, Japan

Center for Information Technoloty, Ehime University, Japan

0000-0001-7074-5225
View Profile

,
Sousuke Amasaki

Faculty of Computer Science and Systems Engineering, Okayama Prefectural University, Japan

Faculty of Computer Science and Systems Engineering, Okayama Prefectural University, Japan

0000-0001-8763-3457
View Profile

,
Tomoyuki Yokogawa

Faculty of Computer Science and Systems Engineering, Okayama Prefectural University, Japan

Faculty of Computer Science and Systems Engineering, Okayama Prefectural University, Japan

0000-0001-6681-2608
View Profile

,
Minoru Kawahara

Center for Information Technoloty, Ehime University, Japan

Center for Information Technoloty, Ehime University, Japan

0000-0002-3542-5039
View Profile

WSSE '23: Proceedings of the 2023 5th World Symposium on Software EngineeringSeptember 2023Pages 33–38https://doi.org/10.1145/3631991.3631996

Published:26 December 2023Publication History

WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering

Pages 33–38

ABSTRACT

Many fault-prone module prediction methods are implemented using machine learning algorithms, and the random forest is well known as the simple and powerful one. However, since the random forest uses an ensemble of decision trees, it is hard to explain why the module is predicted as “fault-prone.” In order to compensate for such a weakness, there have been studies of hybrid prediction methods combining the association rule mining technique with the random forest. In the hybrid method, a module’s fault-proneness is first assessed by the association rules. Then, when the module’s feature does not match any rules, its fault-proneness is evaluated by the random forest model. This paper focuses on how to combine the two techniques and conducts a comparative study to explore a better hybrid prediction method. The empirical results show: (1) it is better to use both association rules of “faulty” and “non-faulty” rather than using only “faulty” rules; (2) it is better to train the random forest classifiers using all data regardless of whether or not they matched association rules.

References

Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc. ACM SIGMOD Int. Conf. Management of Data (Washington, D.C., USA). Association for Computing Machinery, New York, NY, USA, 207–216. https://doi.org/10.1145/170035.170072Google ScholarDigital Library
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. 20th Int. Conf. Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 487–499.Google Scholar
H. Aljamaan and Alazba. 2020. Software defect predictionusing tree-based ensembles. In Proc. 16th ACM Int. Conf. Predictive Models and Data Analytics in Softw. Eng. (Virtual, USA). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3416508.3417114Google ScholarDigital Library
J. Bansiya and C.G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE Trans. Softw. Eng. 28, 1 (Jan. 2002), 4–17. https://doi.org/10.1109/32.979986Google ScholarDigital Library
Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324Google ScholarDigital Library
S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20, 6 (June 1994), 476–493. https://doi.org/10.1109/32.295895Google ScholarDigital Library
Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Softw. Eng. 38, 6 (Nov. 2012), 1276–1304. https://doi.org/10.1109/TSE.2011.103Google ScholarDigital Library
Brian Henderson-Sellers. 1995. Object-oriented Metrics: Measures of Complexity. Prentice Hall, Hoboken, New Jersey.Google ScholarDigital Library
Capers Jones. 2008. Applied Software Measurement: Global Analysis of Productivity and Quality (3rd ed.). McGraw-Hill, New York.Google ScholarDigital Library
Yasutaka Kamei, Akito Monden, Shuji Morisaki, and Ken-ichi Matsumoto. 2008. A Hybrid Faulty Module Prediction Using Association Rule Mining and Logistic Regression Analysis. In Proc. 2nd ACM-IEEE Int. Symp. Empir. Softw. Eng. & Measurement (Kaiserslautern, Germany). Association for Computing Machinery, New York, NY, USA, 279–281. https://doi.org/10.1145/1414004.1414051Google ScholarDigital Library
Yasutaka Kamei, Shuji Morisaki, Akito Monden, and Ken ichi Matsumoto. 2008. A Faulty Module Detection Method Combining Association Rule Mining and Logistic Regression Analysis. IPSJ Journal 49, 12 (Dec. 2008), 3954–3966. http://id.nii.ac.jp/1001/00009364/ (in Japanese).Google Scholar
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (June 2013), 757–773. https://doi.org/10.1109/TSE.2012.70Google ScholarDigital Library
S. Lessmann, B. Baesens, C. Mues, and S Pietsch. 2008. Benchmarking classification models for software defectprediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng. 34, 4 (July 2008), 485–496. https://doi.org/10.1109/TSE.2008.35Google ScholarDigital Library
Paul Luo Li, James Herbsleb, Mary Shaw, and Brian Robinson. 2006. Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.. In Proc. 28th Int. Conf. Softw. Eng. (Shanghai, China). Association for Computing Machinery, New York, NY, USA, 413–422. https://doi.org/10.1145/1134285.1134343Google ScholarDigital Library
Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (Feb. 2015), 504–518. https://doi.org/10.1016/j.asoc.2014.11.023Google ScholarDigital Library
Robert C. Martin. 1994. OO Design Quality Metrics. In Proc. Workshop Pragmatic & Theoretical Directions in Object-Oriented Softw. Metrics (Portland, Oregon, USA). 1–8. (a workshop of OOPSLA’94).Google Scholar
T.J. McCabe. 1976. A Complexity Measure. IEEE Trans. Softw. Eng. SE-2, 4 (Dec. 1976), 308–320. https://doi.org/10.1109/TSE.1976.233837Google ScholarDigital Library
Akito Monden, Jacky Keung, Shuji Morisaki, Yasutaka Kamei, and Ken-Ichi Matsumoto. 2012. A Heuristic Rule Reduction Approach to Software Fault-proneness Prediction. In Proc. 19th Asia-Pacific Softw. Eng. Conf. (Hong Kong, China), Vol. 1. IEEE, USA, 838–847. https://doi.org/10.1109/APSEC.2012.103Google ScholarDigital Library
J. R. Quinlan. 1986. Induction of decision trees. Machine Learning 1 (March 1986), 81–106. https://doi.org/10.1007/BF00116251Google ScholarCross Ref
Danijel Radjenović, Marjan Heričko, Richard Torkar, and Aleš Živkovič. 2013. Software fault prediction metrics: A systematic literature review. Information and Softw. Tech. 55, 8 (Aug. 2013), 1397–1418. https://doi.org/10.1016/j.infsof.2013.02.009Google ScholarDigital Library
Toshiki Seto, Kinari Nishiura, and Akito Monden. 2022. A Replicated Study of Hybrid Fault-prone Module Prediction with Association Rule Analysis and Random Forest. IPSJ Journal 63, 8 (Aug. 2022), 1352–1360. https://doi.org/10.20729/00218983 (in Japanese).Google ScholarCross Ref
Mei-Huei Tang, Ming-Hung Kao, and Mei-Hwa Chen. 1999. An empirical study on object-oriented metrics. In Proc. 6th Int. Softw. Metrics Symp. (Boca Raton, FL, USA). IEEE, USA, 242–249. https://doi.org/10.1109/METRIC.1999.809745Google ScholarCross Ref
F. Zhang, Q. Zheng, Y. Zou, and A.E. Hassan. 2016. Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier. In Proc. 38th Int. Conf. Softw. Eng. (Austin, Texas). Association for Computing Machinery, New York, NY, USA, 309–320. https://doi.org/10.1145/2884781.2884839Google ScholarDigital Library

Index Terms

A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest
1. Software and its engineering
  1. Software creation and management

Recommendations

A hybrid faulty module prediction using association rule mining and logistic regression analysis
ESEM '08: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement

This paper proposes a fault-prone module prediction method that combines association rule mining with logistic regression analysis. In the proposed method, we focus on three key measures of interestingness of an association rule (support, confidence and ...
Read More
Association rule and quantitative association rule mining among infrequent items
MDM '07: Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)

Association rule mining among frequent items has been extensively studied in data mining research. However, in the recent years, there is an increasing demand of mining the infrequent items (such as rare but expensive items). Since exploring interesting ...
Read More
Application of Random Forest in Predicting Fault-Prone Classes
ICACTE '08: Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering

There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering
September 2023
352 pages
ISBN:9798400708053
DOI:10.1145/3631991

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
association rule
fault-prone module prediction
random forest
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 13
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest

WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A hybrid faulty module prediction using association rule mining and logistic regression analysis

Association rule and quantitative association rule mining among infrequent items

Application of Random Forest in Predicting Fault-Prone Classes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Comparative Study of Hybrid Fault-Prone Module Prediction Models Using Association Rule and Random Forest

WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A hybrid faulty module prediction using association rule mining and logistic regression analysis

Association rule and quantitative association rule mining among infrequent items

Application of Random Forest in Predicting Fault-Prone Classes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media