Abstract
This paper is about the evaluation of the results of clustering algorithms, and the comparison of such algorithms. We propose a new method based on the enrichment of a set of independent labeled datasets by the results of clustering, and the use of a supervised method to evaluate the interest of adding such new information to the datasets.
We thus adapt the cascade generalization [1] paradigm in the case where we combine an unsupervised and a supervised learner. We also consider the case where independent supervised learnings are performed on the different groups of data objects created by the clustering [2].
We then conduct experiments using different supervised algorithms to compare various clustering algorithms. And we thus show that our proposed method exhibits a coherent behavior, pointing out, for example, that the algorithms based on the use of complex probabilistic models outperform algorithms based on the use of simpler models.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Gama, J., Brazdil, P.: Cascade generalization. Machine Learning 41, 315–343 (2000)
Apte, C.V., Natarajan, R., Pednault, E.P.D., Tipu, F.A.: A probabilistic estimaton framework for predictive model analytics. IBM Systems Journal 41 (2002)
Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Machine Learning 24, 173–202 (1996)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Int. Conf. on Machine Learning, pp. 148–156 (1996)
Breiman, L.: Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department, University of California (1996)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Meir, R., Rätsch, G.: An introduction to boosting and leveraging. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS, vol. 2600, pp. 118–183. Springer, Heidelberg (2003)
Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Parsons, L., Haque, E., Liu, H.: Evaluating subspace clustering algorithms. In: Workshop on Clustering High Dimensional Data and its Applications, SIAM Int. Conf. on Data Mining, pp. 48–56 (2004)
Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Alpaydin, E.: Combined 5x2cv F-test for comparing supervised classification learning algorithms. Neural Computation 11, 1885–1892 (1999)
Domeniconi, C., Papadopoulos, D., Gunopolos, D., Ma, S.: Subspace clustering of high dimensional data. In: SIAM Int. Conf. on Data Mining (2004)
Candillier, L., Tellier, I., Torre, F., Bousquet, O.: SSC: Statistical Subspace Clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 100–109. Springer, Heidelberg (2005)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
Candillier, L., Tellier, I., Torre, F., Bousquet, O.: SuSE: Subspace Selection embedded in an EM algorithm. In: Miclet, L. (ed.) Actes de la huitième Conférence d’Apprentissage (CAp) (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. KAUFM (1993)
Quinlan, R.: Data mining tools see5 and c5.0 (2004)
Webb, G.I., Agar, J.W.M.: Inducing diagnostic rules for glomerular disease with the DLG machine learning algorithm. Artificial Intelligence in Medicine 4, 419–430 (1992)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Candillier, L., Tellier, I., Torre, F., Bousquet, O. (2006). Cascade Evaluation of Clustering Algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_54
Download citation
DOI: https://doi.org/10.1007/11871842_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)