Copyright © 1997 Published by Elsevier Science B.V.
Wrappers for feature subset selection
Ron Kohavia,
,
and George H. John
, b
Available online 12 May 1998.
Abstract
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
Author Keywords: Classification; Feature selection; Wrapper; Filter
References
[1]. D.W. Aha, Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Internat. J. Man-Machine Studies 36 (1992), pp. 267–287. Abstract | Article |
PDF (1484 K)
| View Record in Scopus | Cited By in Scopus (96)
[2]. D.W. Aha and R.L. Bankert, Feature selection for case-based classification of cloud types: an empirical comparison. In: Working Notes of the AAAI-94 Workshop on Case-Based Reasoning (1994), pp. 106–112.
[3]. D.W. Aha and R.L. Bankert, A comparative evaluation of sequential feature selection algorithms. In: D. Rsher and H. Lenz, Editors, Proceedings 5th International Workshop on Artificial Intelligence and Statistics (1995), pp. 1–7.
[4]. D.W. Aha, D. Kibler and M.K. Albert, Instance-based learning algorithms. Machine Learning 6 (1991), pp. 37–66. View Record in Scopus | Cited By in Scopus (618)
[5]. H. Almuallim and T.G. Dietterich, Learning with many irrelevant features. In: Proceedings AAAI-91Anaheim, CA, , MIT Press, Cambridge, MA (1991), pp. 547–552.
[6]. H. Almuallim and T.G. Dietterich, Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69 (1994), pp. 279–306.
[7]. J.R. Anderson and M. Matessa, Explorations of an incremental, Bayesian algorithm for categorization. Machine Learning 9 (1992), pp. 275–308. View Record in Scopus | Cited By in Scopus (13)
[8]. C.G. Atkeson, Using locally weighted regression for robot learning. In: Proceedings IEEE International Conference on Robotics and Automation (1991), pp. 958–963. View Record in Scopus | Cited By in Scopus (11)
[9]. J. Bala, K.A.D. Jong, J. Haung, H. Vafaie and H. Wechsler, Hybrid learning using genetic algorithms and decision trees for pattern classification. In: C.S. Mellish, Editor, Proceedings IJCAI-95Montreal, Que., , Morgan Kaufmann, Los Altos, CA (1995), pp. 719–724.
[10]. M. Ben-Bassat, Use of distance measures, information measures and error bounds in feature evaluation. In: P.R. Krishnaiah and L.N. Kanal, Editors, Handbook of Statistics Vol. 2, North-Holland, Amsterdam (1982), pp. 773–791. Abstract |
PDF (976 K)
[11]. H. Berliner, The B* tree search algorithm: a best-first proof procedure. Artificial Intelligence 12 (1979), pp. 23–40. Abstract |
PDF (1661 K)
| View Record in Scopus | Cited By in Scopus (23)
reprinted in: B. Webber and N.J. Nilsson, Editors, Readings in Artificial Intelligence, Morgan Kaufmann, Los Altos, CA (1981), pp. 79–87.
[12]. A.L. Blum and R.L. Rivest, Training a 3-node neural network is NP-complete. Neural Networks 5 (1992), pp. 117–127. Abstract |
PDF (956 K)
| View Record in Scopus | Cited By in Scopus (87)
[13]. M. Boddy and T. Dean, Solving time-dependent planning problems. In: N.S. Sridharan, Editor, Proceedings IJCAI-89Detroit, MI, , Morgan Kaufmann, Los Altos, CA (1989), pp. 979–984.
[14]. P. Brazdil, J. Gama and B. Henery, Characterizing the applicability of classification algorithms using meta-level learning. In: F. Bergadano and L.D. Raedt, Editors, Proceedings European Conference on Machine Learning (1994).
[15]. L. Breiman, Bagging predictors. Machine Learning 24 (1996), pp. 123–140. View Record in Scopus | Cited By in Scopus (2145)
[16]. L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone. In: (2nd ed.),Classification and Regression Trees, Wadsworth, Belmont, CA (1984).
[17]. W. Buntine, Learning classification trees. Statist. and Comput. 2 (1992), pp. 63–73. View Record in Scopus | Cited By in Scopus (66)
[18]. C. Cardie, Using decision trees to improve case-based learning. In: Proceedings 10th International Conference on Machine LearningAmherst, MA, , Morgan Kaufmann, Los Altos (1993), pp. 25–32.
[19]. R. Caruana and D. Freitag, Greedy attribute selection. In: W.W. Cohen and H. Hirsh, Editors, Proceedings 11th International Conference on Machine LearningNew Brunswick, NJ, , Morgan Kaufmann, Los Altos, CA (1994), pp. 28–36.
[20]. B. Cestnik, Estimating probabilities: a crucial task in machine learning. In: L.C. Aiello, Editor, Proceedings ECAI-90 (1990), pp. 147–149.
[21]. T.M. Cover and J.M.V. Campenhout, On the possible orderings in the measurement selection problem. IEEE Trans. Systems Man Cybernet. 7 (1977), pp. 657–661. View Record in Scopus | Cited By in Scopus (73)
[22]. B.V. Dasarathy. In: (2nd ed.),Nearest Neighbor (AW) Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, CA (1990).
[23]. R.L. De Mántaras, A distance-based attribute selection measure for decision tree induction. Machine Learning 6 (1991), pp. 81–92. View Record in Scopus | Cited By in Scopus (75)
[24]. P.A. Devijver and J. Kittler. In: (2nd ed.),Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood, Cliffs, NJ (1982).
[25]. J. Doak, An evaluation of feature selection methods and their application to computer security. In: (2nd ed.),Tech. Rept. CSE-92-18, University of California, Davis, CA (1992).
[26]. P. Domingos and M. Pazzani, Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: L. Saitta, Editor, Proceedings 13th International Conference on Machine LearningBari, Italy, , Morgan Kaufmann, Los Altos, CA (1996), pp. 105–112.
[27]. J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features. In: A. Prieditis and S. Russell, Editors, Proceedings 12th International Conference on Machine LearningLake Tahoe, CA, , Morgan Kaufmann, Los Altos, CA (1995), pp. 194–202.
[28]. N.R. Draper and H. Smith. In: (2nd ed.),Applied Regression Analysis, Wiley, New York (1981).
[29]. R. Duda and P. Hart. In: Pattern Classification and Scene Analysis, Wiley, New York (1973).
[30]. U.M. Fayyad, On the induction of decision trees for multiple concept learning. In: Ph.D. Thesis, EECS Department, Michigan University (1991).
[31]. U.M. Fayyad and K.B. Irani, The attribute selection problem in decision tree generation. In: Proceedings AAAI-92San Jose, CA, , MIT Press, Cambridge, MA (1992), pp. 104–110. View Record in Scopus | Cited By in Scopus (27)
[32]. P.W.L. Fong, A quantitative study of hypothesis selection. In: A. Prieditis and S. Russell, Editors, Proceedings 12th International Conference on Machine LearningLake Tahoe, CA, , Morgan Kaufmann, Los Altos, CA (1995), pp. 226–234.
[33]. Y. Freund, Boosting a weak learning algorithm by majority. In: Proceedings 3rd Annual Workshop on Computational Learning Theory (1990), pp. 202–216.
also: Inform. and Comput., to appear.
[34]. Y. Freund and R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings 2nd European Conference on Computational Learning Theory, Springer, Berlin (1995), pp. 23–37.
[35]. G.M. Furnival and R.W. Wilson, Regression by leaps and bounds. Technometrics 16 (1974), pp. 499–511. View Record in Scopus | Cited By in Scopus (176)
[36]. S. Geman, E. Bienenstock and R. Doursat, Neural networks and the bias/variance dilemma. Neural Comput. (1992), pp. 1–48.
[37]. J.H. Gennari, P. Langley and D. Fisher, Models of incremental concept formation. Artificial Intelligence 40 (1989), pp. 11–61. Abstract |
PDF (2764 K)
| View Record in Scopus | Cited By in Scopus (78)
[38]. M.L. Ginsberg. In: Essentials of Artificial Intelligence, Morgan Kaufmann, Los ALtos, CA (1993).
[39]. D.E. Goldberg. In: Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA (1989).
[40]. I.J. Good. In: The Estimation of Probabilities: An Essay on Modern Bayesian Methods, MIT Press, Cambridge, MA (1965).
[41]. R. Greiner, Probabilistic hill climbing: theory and applications. In: J. Glasgow and R. Hadley, Editors, Proceedings 9th Canadian Conference on Artificial IntelligenceVancouver, BC, , Morgan Kaufmann, Los Altos, CA (1992), pp. 60–67.
[42]. T.R. Hancock, On the difficulty of finding small consistent decision trees. , Harvard University, Cambridge, MA (1989) Unpublished manuscript .
[43]. I.W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 (1963), pp. 13–30.
[44]. J.H. Holland. In: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence, MIT Press, Cambridge, MA (1992).
[45]. L. Hyafil and R.L. Rivest, Constructing optimal binary decision trees is NP-complete. Inform. Process. Lett. 5 (1976), pp. 15–17. Abstract |
PDF (247 K)
| View Record in Scopus | Cited By in Scopus (102)
[46]. G.H. John, Enhancements to the data mining process. In: Ph.D. Thesis, Computer Science Department, Stanford University, CA (1997).
[47]. G. John, R. Kohavi and K. Pfleger, Irrelevant features and the subset selection problem. In: Proceedings 11th International Conference on Machine LearningNew Brunswick, NJ, , Morgan Kaufmann, Los Altos, CA (1994), pp. 121–129.
[48]. S. Judd, On the complexity of loading shallow neural networks. J. Complexity 4 (1988), pp. 177–192. Abstract | Article |
PDF (898 K)
| View Record in Scopus | Cited By in Scopus (12)
[49]. L.P. Kaelbling. In: Learning in Embedded Systems, MIT Press, Cambridge, MA (1993).
[50]. K. Kira and L.A. Rendell, The feature selection problem: Traditional methods and a new algorithm. In: Proceedings AAAI-92San Jose, CA, , MIT Press, Cambridge, MA (1992), pp. 129–134. View Record in Scopus | Cited By in Scopus (135)
[51]. K. Kira and L.A. Rendell, A practical approach to feature selection. In: Proceedings 9th International Conference on Machine LearningAberdeen, Scotland, , Morgan Kaufmann, Los Altos, CA (1992).
[52]. J. Kittler, Une généralisation de quelques algorithms sous-optimaux de recherche d'ensembles d'attributs. In: Proceedings Congrés Reconnaissance des Formes et Traitement des Images (1978).
[53]. J. Kittler. In: Feature Selection and Extraction, Academic Press, New York (1986), pp. 59–83 Chapter 3 . View Record in Scopus | Cited By in Scopus (61)
[54]. R. Kohavi, Feature subset selection as search with probabilistic estimates. In: Proceedings AAAI Fall Symposium on Relevance (1994), pp. 122–126.
[55]. R. Kohavi, The power of decision tables. In: Proceedings European Conference on Machine LearningN. Lavrac and S. Wrobel, Editors, Lecture Notes in Artificial Intelligence Vol. 914, Springer, Berlin (1995), pp. 174–189.
[56]. R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection. In: C.S. Mellish, Editor, Proceedings IJCAI-95Montreal, Que., , Morgan Kaufmann, Los Altos, CA (1995), pp. 1137–1143.
[57]. R. Kohavi, Wrappers for performance enhancement and oblivious decision graphs. In: STAN-CS-TR-95-1560Ph.D. Thesis, Stanford University, Computer Science Department (1995); (ftp://starry.stanford.edu/pub /ronnyk/teza.ps).
[58]. R. Kohavi and B. Frasca, Useful feature subsets and rough set reducts. In: Proceedings 3rd International Workshop on Rough Sets and Soft Computing (1994), pp. 310–317.
also: in: Soft Computing by Lin and Wildberger.
[59]. R. Kohavi and G. John, Automatic parameter selection by minimizing estimated error. In: A. Prieditis and S. Russell, Editors, Proceedings 12th International Conference on Machine LearningLake Tahoe, CA, , Morgan Kaufmann, Los Altos, CA (1995), pp. 304–312.
[60]. R. Kohavi and D. Sommerfield, Feature subset selection using the wrapper model: overfilling and dynamic search space topology. In: Proceedings 1st International Conference on Knowledge Discovery and Data Mining (1995), pp. 192–197.
[61]. R. Kohavi and D.H. Wolpert, Bias plus variance decomposilion for zero-one loss funclions. In: L. Saitta, Editor, Proceedings 13th International Conference on Machine LearningBan, Italy, , Morgan Kaufmann, Los Altos, CA (1996), pp. 275–283 available at: .
[62]. R. Kohavi, D. Sommerfield and J. Dougherty, Data mining using MLC++: A machine learning library in C++. In: Tools with Artificial Intelligence, IEEE Computer Society Press, Rockville, MD (1996), pp. 234–245. View Record in Scopus | Cited By in Scopus (37)
[63]. I. Kononenko and L. De Raedt, Estimating attributes: analysis and extensions of Relief. In: F. Bergadano, Editor, Proceedings European Conference on Machine Learning (1994).
[64]. I. Kononenko, On biases in estimating multi-valued attributes. In: C.S. Mellish, Editor, Proceedings IJCAI-95Montreal, Que., , Morgan Kaufmann, Los Altos, CA (1995), pp. 1034–1040.
[65]. J. Koza. In: (3rd ed.),Genetic Programming: On the Programming of Computers by Selection Means of Natural Selection, MIT Press, Cambridge, MA (1992).
[66]. A. Krogh and J. Vedelsby, Neural network ensembles, cross validation, and active learning. In: (3rd ed.),Advances in Neural Information Processing Systems Vol. 7, MIT Press, Cambridge, MA (1995).
[67]. S.W. Kwok and C. Carter, Multiple decision trees. In: R.D. Schachter, T.S. Levitt, L.N. Kanal and J.F. Lemmer, Editors, Uncertainty in Artificial Intelligence, Elsevier, Amsterdam (1990), pp. 327–335.
[68]. P. Laarhoven and E. Aarts. In: (3rd ed.),Simulated annealing: Theory and Applications, Kluwer Academic Publishers, Dordrecht (1987).
[69]. P. Langley, Selection of relevant features in machine learning. In: Proceedings AAAI Fall Symposium on Relevance (1994), pp. 140–144.
[70]. P. Langley and S. Sage, Sage, Induction of selective Bayesian classifiers. In: Proceedings 10th Conference on Uncertainty in Artificial IntelligenceSeattle, WA, , Morgan Kaufmann, San Maleo, CA (1994), pp. 399–406.
[71]. P. Langley and S. Sage, Oblivious decision trees and abslracl cases. In: Working Notes of the AAAI-94 Workshop on Case-Based ReasoningSeattle, WA, , AAAI Press (1994), pp. 113–117.
[72]. P. Langley, W. Iba and K. Thompson, An analysis of Bayesian classifiers. In: Proceedings AAAI-94Seattle, WA, , AAAI Press and MIT Press (1992), pp. 223–228. View Record in Scopus | Cited By in Scopus (171)
[73]. H. Linhart and W. Zucchini. In: (3rd ed.),Model Selection, Wiley, New York (1986).
[74]. N. Littlestone and M.K. Warmuth, The weighted majority algorithm. Inform. and Comput. 108 (1994), pp. 212–261. Abstract |
PDF (2160 K)
| View Record in Scopus | Cited By in Scopus (267)
[75]. C.L. Mallows, Some comments on cp. Technometrics 15 (1973), pp. 661–675.
[76]. T. Marill and D.M. Green, On the effectiveness of receptors in recognition systems. IEEE Trans. Inform. Theory 9 (1963), pp. 11–17.
[77]. O. Maron and A.W. Moore, Hoeffding races: accelerating model selection search for classification and function approximation. In: (3rd ed.),Advances in Neural Information Processing Systems Vol. 6, Morgan Kaufmann, Los Altos, CA (1994).
[78]. C.J. Merz and P.M. Murphy, UCI repository of machine learning databases. (1996); (http://www.ics.uci.edu/˜mlearn/MLRepository.html).
[79]. A.J. Miller, Selection of subsets of regression variables. J. Roy. Statist. Soc. A 147 (1984), pp. 389–425.
[80]. A.J. Miller. In: (3rd ed.),Subset Selection in Regression, Chapman and Hall, London (1990).
[81]. M.L. Minsky and S. Papert. In: (3rd ed.),Perceptrons: an Introduction to Computational Geometry, MIT Press, Cambridge, MA (1988) expanded ed. .
[82]. D. Mladeni
, Automated model selection. In: ECML Workshop on Knowledge Level Modeling and Machine Learning (1995).
[83]. M. Modrzejewski, Feature selection using rough sets theory. In: P.B. Brazdil, Editor, Proceedings European Conference on Machine Learning, Springer, Berlin (1993), pp. 213–226.
[84]. A.W. Moore and M.S. Lee, Efficient algorithms for minimizing cross validation error. In: W.W. Cohen and H. Hirsh, Editors, Proceedings 11th International Conference on Machine LearningNew Brunswick, NJ, , Morgan Kaufmann, Los Altos, CA (1994).
[85]. B.M.E. Moret, Decision trees and diagrams. ACM Comput. Surveys 14 (1982), pp. 593–623. View Record in Scopus | Cited By in Scopus (37)
[86]. S. Murthy and S. Salzberg, Lookahead and pathology in decision tree induction. In: C.S. Mellish, Editor, Proceedings IJCAI-95Montreal, Que., , Morgan Kaufmann, Los Altos, CA (1995), pp. 1025–1031.
[87]. M.P. Narendra and K. Fukunaga, A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 26 (1977), pp. 917–922.
[88]. J. Neter, W. Wasserman and M.H. Kutner. In: (3rd ed.),Applied Linear Statistical Models, Irwin, Homewood, IL (1990).
[89]. Z. Pawlak. In: Rough Sets, Kluwer Academic Publishers, Dordrecht (1991).
[90]. Z. Pawlak, Rough sets: present state and the future. Found. Comput. Decision Sci. 18 (1993), pp. 157–166.
[91]. M.J. Pazzani, Searching for dependencies in Bayesian classifiers. In: D. Fisher and H. Lenz, Editors, Proceedings 5th International Workshop on Artificial Intelligence and Statistics (1995).
[92]. M. Perrone, Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization. In: Ph.D. Thesis, Physics Department, Brown University, Providence, RI (1993).
[93]. G.M. Provan and M. Singh, Learning Bayesian networks using feature selection. In: D. Fisher and H. Lenz, Editors, Proceedings 5th International Workshop on Artificial Intelligence and Statistics (1995), pp. 450–456.
[94]. F.J. Provost, Policies for the selection of bias in inductive machine learning. In: Computer Science Department, University of Pittsburgh, Rept. No. 92-34Ph.D. Thesis (1992).
[95]. F.J. Provost and B.G. Buchanan, Inductive policy: the pragmatics of bias selection. Machine Learning 20 (1995), pp. 35–61. View Record in Scopus | Cited By in Scopus (25)
[96]. J.R. Quinlan, Induction of decision trees. Machine Learning 1 (1986), pp. 81–106 reprinted in ; In: J.W. Shavlik and T.G. Dietterich, Editors, Readings in Machine Learning, Morgan Kaufmann, San Mateo, CA (1986). View Record in Scopus | Cited By in Scopus (2149)
[97]. J.R. Quinlan. In: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA (1993).
[98]. J.R. Quinlan, Oversearching and layered search in empirical learning. In: C.S. Mellish, Editor, Proceedings IJCAI-95Montreal, Que., , Morgan Kaufmann, Los Altos, CA (1995), pp. 1019–1024.
[99]. L. Rendell and R. Seshu, Learning hard concepts through constructive induction: Framework and rationale. Comput. Intell. 6 (1990), pp. 247–270.
[100]. F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65 (1958), pp. 386–408. Abstract |
PDF (1619 K)
[101]. S.J. Russell and P. Norvig. In: Artificial Intelligence: A Modern Approach, Prentice Hall, Englewood Cliffs, NJ (1995).
[102]. C. Schaffer, Selecting a classification method by cross-validation. Machine Learning 13 (1993), pp. 135–143. View Record in Scopus | Cited By in Scopus (40)
[103]. R.E. Schapire, The strength of weak learnability. Machine Learning 5 (1990), pp. 197–227. View Record in Scopus | Cited By in Scopus (501)
[104]. W. Siedlecki and J. Sklansky, On automatic feature selection. Internat. J. Pattern Recognition and Artificial Intelligence 2 (1988), pp. 197–220.
[105]. M. Singh and G.M. Provan, A comparison of induction algorithms for selective and non-selective Bayesian classifiers. In: Proceedings 12th International Conference on Machine LearningLake Tahoe, CA, , Morgan Kaufmann, San Mateo, CA (1995), pp. 497–505.
[106]. D.B. Skalak, Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: W.W. Cohen and H. Hirsh, Editors, Proceedings 11th International Conference on Machine LearningNew Brunswick, NJ, , Morgan Kaufmann, Los Altos, CA (1994).
[107]. W.N. Street, O.L. Mangasarian and W.H. Wolberg, An inductive learning approach to prognostic prediction. In: Proceedings 12th International Conference on Machine LearningLake Tahoe, CA, , Morgan Kaufmann, San Mateo, CA (1995).
[108]. C. Taylor, D. Michie and D. Spiegelhalter. In: Machine Learning, Neural and Statistical Classification, Paramount Publishing International (1994).
[109]. Thrun, The Monk's problems: a performance comparison of different learning algorithms. In: Tech. Rept. CMU-CS-91-197, Carnegie Mellon University, Pittsburgh, PA (1991).
[110]. P.D. Turney, Exploiting context when learning to classify. In: P.B. Brazdil, Editor, Proceedings European Conference on Machine Learning (ECML) (1993), pp. 402–407.
[111]. P.D. Turney, The identification of context-sensitive features, a formal definition of context for concept learning. In: M. Kubat and G. Widmer, Editors, Proceedings Workshop on Learning in Context-Sensitive Domains (1996), pp. 53–59 also available as: National Research Council of Canada Tech. Rept. #39222 .
[112]. P.E. Utgoff, An improved algorithm for incremental induction of decision trees. In: Proceedings 11th International Conference on Machine LearningNew Brunswick, NJ, , Morgan Kaufmann, Los Altos, CA (1994), pp. 318–325.
[113]. P.E. Utgoff, Decision tree induction based on efficient tree restructuring. In: Tech. Rept. 05-18, University of Massachusetts, Amherst, MA (1995).
[114]. H. Vafai and K. De Jong, Genetic algorithms as a tool for feature selection in machine learning. In: Proceedings 4th International Conference on Tools with Artificial Intelligence, IEEE Computer Society Press, Rockville, MD (1992), pp. 200–203.
[115]. H. Vafai and K. De Jong, Robust feature selection algorithms. In: Proceedings 5th International Conference on Tools with Artificial Intelligence, IEEE Computer Society Press, Rockville, MD (1993), pp. 356–363.
[116]. D.H. Wolpert, On the connection between in-sample testing and generalization error. Complex Systems 6 (1992), pp. 47–94.
[117]. D.H. Wolpert, Stacked generalization. Neural Networks 5 (1992), pp. 241–259. Abstract |
PDF (1908 K)
| View Record in Scopus | Cited By in Scopus (565)
[118]. L. Xu, P. Yan and T. Chang, Best first strategy for feature selection. In: Proceedings 9th International Conference on Pattern Recognition, IEEE Computer Society Press, Rockville, MD (1989), pp. 706–708.
[119]. D. Yan and H. Mukai, Stochastic discrete optimization. SIAM J. Control and Optimization 30 (1992), pp. 594–612. View Record in Scopus | Cited By in Scopus (51)
[120]. B. Yu and B. Yuan, A more efficient branch and bound algorithm for feature selection. Pattern Recognition 26 (1993), pp. 883–889. Abstract | Article |
PDF (428 K)
| View Record in Scopus | Cited By in Scopus (34)
[121]. W. Ziarko, The discovery, analysis and representation of data dependencies in databases. In: G. Piatetsky-Shapiro and W. Frawley, Editors, Knowledge Discovery in Databases, MIT Press, Cambridge, MA (1991).







E-mail Article
Add to my Quick Links

Cited By in Scopus (894)



