Cascade Evaluation of Clustering Algorithms

Candillier, Laurent; Tellier, Isabelle; Torre, Fabien; Bousquet, Olivier

doi:10.1007/11871842_54

Laurent Candillier^21,22,
Isabelle Tellier²¹,
Fabien Torre²¹ &
…
Olivier Bousquet²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

5532 Accesses
9 Citations

Abstract

This paper is about the evaluation of the results of clustering algorithms, and the comparison of such algorithms. We propose a new method based on the enrichment of a set of independent labeled datasets by the results of clustering, and the use of a supervised method to evaluate the interest of adding such new information to the datasets.

We thus adapt the cascade generalization [1] paradigm in the case where we combine an unsupervised and a supervised learner. We also consider the case where independent supervised learnings are performed on the different groups of data objects created by the clustering [2].

We then conduct experiments using different supervised algorithms to compare various clustering algorithms. And we thus show that our proposed method exhibits a coherent behavior, pointing out, for example, that the algorithms based on the use of complex probabilistic models outperform algorithms based on the use of simpler models.

Download to read the full chapter text

Chapter PDF

Constraint-based clustering selection

Article 05 June 2017

Toon Van Craenendonck & Hendrik Blockeel

Clustering

Generalizing from Example Clusters

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Gama, J., Brazdil, P.: Cascade generalization. Machine Learning 41, 315–343 (2000)
Article MATH Google Scholar
Apte, C.V., Natarajan, R., Pednault, E.P.D., Tipu, F.A.: A probabilistic estimaton framework for predictive model analytics. IBM Systems Journal 41 (2002)
Google Scholar
Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Machine Learning 24, 173–202 (1996)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Int. Conf. on Machine Learning, pp. 148–156 (1996)
Google Scholar
Breiman, L.: Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department, University of California (1996)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Meir, R., Rätsch, G.: An introduction to boosting and leveraging. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS, vol. 2600, pp. 118–183. Springer, Heidelberg (2003)
Chapter Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Article Google Scholar
Parsons, L., Haque, E., Liu, H.: Evaluating subspace clustering algorithms. In: Workshop on Clustering High Dimensional Data and its Applications, SIAM Int. Conf. on Data Mining, pp. 48–56 (2004)
Google Scholar
Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Article Google Scholar
Alpaydin, E.: Combined 5x2cv F-test for comparing supervised classification learning algorithms. Neural Computation 11, 1885–1892 (1999)
Article Google Scholar
Domeniconi, C., Papadopoulos, D., Gunopolos, D., Ma, S.: Subspace clustering of high dimensional data. In: SIAM Int. Conf. on Data Mining (2004)
Google Scholar
Candillier, L., Tellier, I., Torre, F., Bousquet, O.: SSC: Statistical Subspace Clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 100–109. Springer, Heidelberg (2005)
Chapter Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Candillier, L., Tellier, I., Torre, F., Bousquet, O.: SuSE: Subspace Selection embedded in an EM algorithm. In: Miclet, L. (ed.) Actes de la huitième Conférence d’Apprentissage (CAp) (2006)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. KAUFM (1993)
Google Scholar
Quinlan, R.: Data mining tools see5 and c5.0 (2004)
Google Scholar
Webb, G.I., Agar, J.W.M.: Inducing diagnostic rules for glomerular disease with the DLG machine learning algorithm. Artificial Intelligence in Medicine 4, 419–430 (1992)
Article Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)
MathSciNet Google Scholar
Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

GRAppA, Charles de Gaulle University, Lille 3
Laurent Candillier, Isabelle Tellier & Fabien Torre
Pertinence, 32 rue des Jeûneurs, 75002, Paris
Laurent Candillier & Olivier Bousquet

Authors

Laurent Candillier
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Tellier
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Torre
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Bousquet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Candillier, L., Tellier, I., Torre, F., Bousquet, O. (2006). Cascade Evaluation of Clustering Algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_54

Download citation

DOI: https://doi.org/10.1007/11871842_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cascade Evaluation of Clustering Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Constraint-based clustering selection

Clustering

Generalizing from Example Clusters

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Cascade Evaluation of Clustering Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Constraint-based clustering selection

Clustering

Generalizing from Example Clusters

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation