Abstract
Ensemble methods improve accuracy by combining the predictions of a set of different hypotheses. However, there are two important shortcomings associated with ensemble methods. Huge amounts of memory are required to store a set of multiple hypotheses and, more importantly, comprehensibility of a single hypothesis is lost. In this work, we devise a new method to extract one single solution from a hypothesis ensemble without using extra data, based on two main ideas: the selected solution must be similar, semantically, to the combined solution, and this similarity is evaluated through the use of a random dataset. We have implemented the method using shared ensembles, because it allows for an exponential number of potential base hypotheses. We include several experiments showing that the new method selects a single hypothesis with an accuracy which is reasonably close to the combined hypothesis.
This work has been partially supported by CICYT under grant TIC2001-2705-C03- 01, Generalitat Valenciana under grant GV00-092-14 and Acción Integrada Hispano- Alemana HA2001-0059.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
J.G. Cleary and L.E. Trigg. Experiences with ob1, an optimal bayes decision tree learner. Technical report, Department of Computer Science, Univ. of Waikato, New Zealand, 1998.
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Meas., 20:37–46, 1960.
T. G Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems, pages 1–15, 2000.
Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization. Machine Learning, 40(2):139–157, 2000.
C. Ferri, J. Hernández, and M.J. Ramírez. Induction of Decision Multi-trees using Levin Search. In Int. Conf. on Computational Science,ICCS’02, LNCS, 2002.
C. Ferri, J. Hernández, and M.J. Ramírez. Learning multiple and different hypotheses. Technical report, Department of Computer Science, Universitat Politécnica de Valéncia, 2002.
C. Ferri, J. Hernández, and M.J. Ramírez. SMILES system, a multi-purpose learning system. http://www.dsic.upv.es/~flip/smiles/, 2002.
Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In the 13th Int. Conf. on Machine Learning (ICML’1996), pages 148–156, 1996.
Tim Kam Ho. C4.5 decision forests. In Proc. of 14th Intl. Conf. on Pattern Recognition,Brisbane,Australia, pages 545–549, 1998.
Ludmila I. Kuncheva. A Theoretical Study on Six Classifier Fusion Strategies. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(2):281–286, 2002.
Ludmila I. Kuncheva and Christopher J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Submitted to Machine Learning, 2002.
Dragos D. Margineantu and Thomas G. Dietterich. Pruning adaptive boosting. In 14th Int. Conf. on Machine Learning, pages 211–218. Morgan Kaufmann, 1997.
N.J. Nilsson. Artificial Intelligence: a new synthesis. Morgan Kaufmann, 1998.
University of California. UCI Machine Learning Repository Content Summary. http://www.ics.uci.edu/~mlearn/MLSummary.html.
J. Quinlan. Miniboosting decision trees. Submitted to JAIR, 1998.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
J. R. Quinlan. Bagging, Boosting, and C4.5. In Proc. of the 13th Nat. Conf. on A.I. and the 8th Innovative Applications of A.I. Conf., pages 725–730. AAAI/MIT Press, 1996.
Ross Quinlan. Relational learning and boosting. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 292–306. Springer-Verlag, September 2001.
P. Volf and F. Willems. Context maximizing: Finding mdl decision trees. In Symposium on Information Theory in the Benelux,Vol.15, pages 192–200, 1994.
Geoffrey I. Webb. Further experimental evidence against the utility of Occam’s razor. Journal of Artificial Intelligence Research, 4:397–417, 1996.
David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J. (2002). From Ensemble Methods to Comprehensible Models. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_16
Download citation
DOI: https://doi.org/10.1007/3-540-36182-0_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00188-1
Online ISBN: 978-3-540-36182-4
eBook Packages: Springer Book Archive