Skip to main content
Log in

To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

While many papers propose innovative methods for constructing individual rules in separate-and-conquer rule learning algorithms, comparatively few study the heuristic rule evaluation functions used in these algorithms to ensure that the selected rules combine into a good rule set. Underestimating the impact of this component has led to suboptimal design choices in many algorithms. The main goal of this paper is to demonstrate the importance of heuristic rule evaluation functions by improving existing rule induction techniques and to provide guidelines for algorithm designers. We first select optimal heuristic rule learning functions for several metaheuristic-based algorithms and empirically compare the resulting heuristics across algorithms. This results in large and significant improvements of the predictive accuracy for two techniques. We find that despite the absence of a global optimal choice for all algorithms, good default choices can be shared across algorithms with similar search biases. A near-optimal selection can thus be found for new algorithms with minor experimental tuning. Lastly, a major contribution is made towards balancing a model’s predictive accuracy with its comprehensibility. We construct a Pareto front of optimal solutions for this trade-off and show that gains in comprehensibility and/or accuracy are possible for the techniques studied. The parametrized heuristics enable users to select the desired balance as they offer a high flexibility when it comes to selecting the desired accuracy and comprehensibility in rule miners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A more general approach is the weighted covering strategy in which the weights are simply reduced. While the separate-and-conquer strategy is a special case, it is far more widespread.

  2. Several terms are encountered in the literature that refer to heuristic rule evaluation functions. Alternative terminology includes ‘rule learning heuristic’, ‘(rule) evaluation function’, ‘fitness function’ and ‘(rule) quality measure’. For the sake of brevity, the term ‘heuristic’ is used in this text where applicable.

  3. The RIPPER algorithm is an exception as corrections can still be made in the post-processing step.

  4. Negative fitness values are also not allowed in these algorithms. For this reason, all parametrized heuristics described in the previous section were incremented with a constant to bring the lowest possible value at precisely zero.

  5. AntMiner is available at http://sourceforge.net/projects/guiantminer/

  6. AntMiner+ is available at http://www.antminerplus.com

  7. PSO/ACO2 is available at http://sourceforge.net/projects/psoaco2/

  8. An implementation of HIDER is available as part of the KEEL software project (Alcalá-Fdez et al. 2009, 2011) at http://www.keel.es/

  9. The probability of an individual to be selected is proportional to its fitness value. This method thus uses the exact values returned by the fitness function.

  10. We use the RIPPER implementation included in the Weka project (Hall et al. 2009) at http://www.cs.waikato.ac.nz/ml/weka/

  11. When using non-gain heuristics in our experiments, the best rule is returned as the last rule is suboptimal in that case.

  12. anneal, audiology, breast-cancer, cleveland-heart-disease, contact-lenses, credit, glass2, glass, hepatitis, horse-colic, hypothyroid, iris, krkp, labor, lymphography, monk1, monk2, monk3, mushroom, sick-euthyroid, soybean, tic-tac-toe, titanic, vote-1, vote, vowel, wine

  13. auto-mpg, autos, balance-scale, balloons, breast-w, breast-w-d, bridges2, credit-g, diabetes, echocardiogram, flag, hayes-roth, heart-h, heart-statlog, ionosphere, machine, primarytumor, promoters, segment, solar-flare, sonar, vehicle, zoo

  14. The UCI datasets are available at http://archive.ics.uci.edu/ml/. The Titanic dataset is available as part of the Delve project of the University of Toronto at http://www.cs.toronto.edu/~delve/.

  15. The statistical tool used to perform the Friedman test and advanced post-hoc procedures can be found at http://sci2s.ugr.es/keel/multipleTest.zip

  16. The data of these experiments is available online at http://www.antminerplus.com

  17. The ACO search starts with an environment that is not specialized and has to learn to add specific terms to the rules (top-down). In PSO and genetic algorithms, the initialization of the swarm/population with either general or specific particles/solutions determines the top-down or bottom-up nature. HIDER initializes with very specific instances (bottom-up). PSO/ACO2 adds conditions to rules in two phases (top-down). The first phase is a limited ACO/PSO hybrid search (top-down). The second phase is a limited PSO search with a non-specializing seeding (top-down).

  18. Our tuning data contains significant and large positive (Spearman) correlations of the observed best performing parameter compared across the different parametrized heuristics. This indicates that the bias requirements of the datasets are somewhat stable.

References

  • Aguilar-Ruiz J, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern Part B Cybern 33:324–331

    Article  Google Scholar 

  • Aguilar-Ruiz JS, Giráldez R, Santos JCR (2007) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479

    Article  Google Scholar 

  • Alcalá-Fdez J, Sánchez L, García S, del Jesus M, Ventura S, Garrell J, Otero J, Romero C, Bacardit J, Rivas V, Fernández J, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318

    Article  Google Scholar 

  • Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Logic Soft Comput 17(2–3):255–287

    Google Scholar 

  • An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. In: Proceedings of 12th Int Symp Found Intell Syst, ISMIS’00, pp 119–129

  • Andrews R, Diederich J, Tickle AB (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl-Based Syst 8(6):373–389

    Article  Google Scholar 

  • Baesens B, Setiono R, Mues C, Vanthienen J (2003a) Using neural network rule extraction and decision tables for credit-risk evaluation. Manag Sci 49(3):312–329

    Article  MATH  Google Scholar 

  • Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003b) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635

    Article  MATH  Google Scholar 

  • Booker LB, Goldberg DE, Holland JH (1989) Classifier systems and genetic algorithms. Artif Intell 40(1–3):235–282

    Article  Google Scholar 

  • Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: 9th Eur Conf, Artif Intell, ECAI’90, pp 147–149

  • Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283

    Google Scholar 

  • Cohen W (1995) Fast effective rule induction. In: Proc 12th Int Conf, Mach Learn, ICML’95, pp 115–123

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  • Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41

    Article  Google Scholar 

  • Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8(1):87–102

    MATH  Google Scholar 

  • Fernández A, García S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans Evol Comput 14(6):913–941

    Article  Google Scholar 

  • Freitas AA (2003) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsiu S (eds) Advances in evolutionary computing: theory and applications. Springer-Verlag New York Inc., New York, NY, USA, pp 819–845

    Chapter  Google Scholar 

  • Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54

    Article  MATH  Google Scholar 

  • Fürnkranz J, Flach P (2003) An analysis of rule evaluation metrics. In: Proc 20th Int Conf, Mach Learn, ICML’03, pp 202–209

  • Fürnkranz J, Flach P (2005) ROC ‘n’ rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1):39–77

    Article  MATH  Google Scholar 

  • García S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

  • Greene DP, Smith SF (1993) Competition-based induction of decision models from examples. Mach Learn 13(2–3):229–257

    Article  Google Scholar 

  • Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11:10–18

    Article  Google Scholar 

  • Hettich S, Bay SD (1996) The uci kdd archive [http://kdd.ics.uci.edu]

  • Holden N, Freitas A (2005) A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data. In: Proc IEEE Swarm Intell Symp, SIS’05, pp 100–107

  • Holden N, Freitas AA (2008) A hybrid PSO/ACO algorithm for discovering classification rules in data mining. J Artif Evol App 2008:1–12

    Article  Google Scholar 

  • Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: Proc 11th Int Joint Conf, Artif Intell, IJCAI’89, pp 813–818

  • Janssen F, Fürnkranz J (2009) A re-evaluation of the over-searching phenomenon in inductive rule learning. In: Proc SIAM Int Conf Data Min, SDM’09, pp 329–340

  • Janssen F, Fürnkranz J (2010) On the quest for optimal rule learning heuristics. Mach Learn 78(3):343–379

    Article  MathSciNet  Google Scholar 

  • Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proc 10th Natl Conf, Artif Intell, AAAI’92, pp 129–134

  • Klösgen W (1992) Problems for knowledge discovery in databases and their treatment in the statistics interpreter EXPLORA. Int J Intell Syst 7(7):649–673

    Article  MATH  Google Scholar 

  • Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Eur Conf, Mach Learn, ECML’94, pp 171–182

  • Liu B, Abbass HA, McKay B (2002a) Density-based heuristic for rule discovery with ant-miner. In: Proc 6th Australasia-Japan Joint Workshop Intell Evol Syst, AJWIS’02, pp 180–184

  • Liu H, Hussain F, Tan CL, Dash M (2002b) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393–423

    Article  MathSciNet  Google Scholar 

  • Liu B, Abbass H, McKay B (2003) Classification rule discovery with ant colony optimization. In: Proc IEEE/WIC Int Conf Intell Agent Tech, IAT’03, pp 83–88

  • Lopes HS, Coutinho MS, Lima WC (1997) An evolutionary approach to simulate cognitive feedback learning in medical domain. In: Sanchez E, Shibata T, Zadeh L (eds) Genetic algorithms and fuzzy sogic systems: soft computing perspectives. World Scientific, Singapore, pp 193–207

    Chapter  Google Scholar 

  • Martens D, Baesens B, Van Gestel T, Vanthienen J (2007a) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476

    Article  MATH  Google Scholar 

  • Martens D, De Backer M, Haesen R, Vanthienen J, Snoeck M, Baesens B (2007b) Classification with ant colony optimization. IEEE Trans Evol Comput 11(5):651–665

    Article  Google Scholar 

  • Martens D, Baesens B, Fawcett T (2011) Editorial survey: swarm intelligence for data mining. Mach Learn 82(1):1–42

    Article  MathSciNet  Google Scholar 

  • Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403

    MATH  Google Scholar 

  • Montes de Oca M, Stützle T, Birattari M, Dorigo M (2009) Frankenstein’s PSO: a composite particle swarm optimization algorithm. IEEE Trans Evol Comput 13(5):1120–1132

    Article  Google Scholar 

  • Otero FEB, Freitas AA, Johnson CG (2009) Handling continuous attributes in ant colony classification algorithms. In: Proc IEEE Symp Comput Intell Data Min, IEEE, CIDM’09, pp 225–231

  • Pappa G, Freitas A (2009) Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl Inf Syst 19(3):283–309

    Article  Google Scholar 

  • Parpinelli RS, Lopes HS, Freitas AA (2001) An ant colony based system for data mining: Applications to medical data. In: Proc Genet Evol Comput Conf, GECCO’01, pp 791–797

  • Parpinelli R, Lopes H, Freitas A (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 6(4):321–332

    Article  Google Scholar 

  • Pazzani M, Mani S, Shankle W (2001) Acceptance by medical experts of rules generated by machine learning. Methods Inf Med 40(5):380–385

    Google Scholar 

  • Quinlan JR (1990) Learning logical definitions from relations. Mach Learn 5(3):239–266

    Google Scholar 

  • Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA

    Google Scholar 

  • Salama K, Abdelbar A (2011) Exploring different rule quality evaluation functions in aco-based classification algorithms. In: IEEE Symp Swarm Intell, SIS’11, pp 1–8

  • Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis, Pittsburgh, PA

  • Sousa T, Silva A, Neves A (2004) Particle swarm based data mining algorithms for classification tasks. Parallel Comput 30(5–6):767–783

    Article  Google Scholar 

  • Steuer R (1986) Multiple criteria optimization: Theory, computation and application. Wiley, New York, NY

    MATH  Google Scholar 

  • Stützle T, Holger HH (2000) MAX-MIN ant system. Future Generat Comput Syst 16(9):889–914

    Google Scholar 

  • Suykens JAK, Van Gestel T, Brabanter JD, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore

    Book  MATH  Google Scholar 

  • Tan PN, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proc 8th ACM SIGKDD Int Conf Knowl Discov Data Min, KDD’02, pp 32–41

  • Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston, MA

    Google Scholar 

  • Van Gestel T, Suykens J, Baesens B, Viaene S, Vanthienen J, Dedene G, De Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5–32

    Article  MATH  Google Scholar 

  • van Rijsbergen CJ (1979) Information retrieval. Butterworth-Heinemann, Newton, MA

    Google Scholar 

  • Venturini G (1993), SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proc Eur Conf, Mach Learn, ECML’93, pp 280–296

  • Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA

    Google Scholar 

  • Wong ML (1998) An adaptive knowledge-acquisition system using generic genetic programming. Expert Syst Appl 15(1):47–58

    Article  Google Scholar 

  • Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proc 1th Eur Symp Princ Data Min Knowl Discov, PKDD’97, pp 78–87

Download references

Acknowledgments

This work was carried out using the Stevin Supercomputer Infrastructure at Ghent University. We would like to thank the Flemish Research Council for the financial support (FWO Odysseus grant B.0915.09). We are also grateful to the creators of the open source implementations used in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart Minnaert.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minnaert, B., Martens, D., De Backer, M. et al. To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms. Data Min Knowl Disc 29, 237–272 (2015). https://doi.org/10.1007/s10618-013-0339-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0339-5

Keywords

Navigation