Abstract
In this work we analyse the relation between hierarchical distance-based clustering and the concepts that can be obtained from the hierarchy by generalisation. Many inconsistencies may arise, because the distance and the conceptual generalisation operator are usually incompatible. To overcome this, we propose an algorithm which integrates distance-based and conceptual clustering. The new dendrograms can show when an element has been integrated to the cluster because it is near in the metric space or because it is covered by the concept. In this way, the new clustering can differ from the original one but the metric traceability is clear. We introduce three different levels of agreement between the clustering hierarchy obtained from the linkage distance and the new hierarchy, and we define properties these generalisation operators should satisfy in order to produce distance-consistent dendrograms.
This work has been partially supported by the EU (FEDER) and the Spanish MEC/MICINN under grant TIN2007-68093-C02 and the Spanish project "Agreement Technologies" (Consolider Ingenio CSD2007-00022). A. Funes was supported by a grant from the Alfa Lernet project and the UNSL.
Chapter PDF
Similar content being viewed by others
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Survey 31(3), 264–323 (1999)
Berkhin, P.: A Survey of Clustering Data Mining Techniques, Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)
Michalski, R.S.: Knowledge Acquisition Through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts. Policy Analysis and Information Systems 4(3), 219–244 (1980)
Michalski, R.S., Stepp, R.E.: Learning from Observation: Conceptual Clustering. In: Michalski, et al. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 331–363. TIOGA Publishing Co. (1983)
Ramon, J., Bruynooghe, M., Van Laer, W.: Distance measures between atoms. CompulogNet Meeting ComputingLogic & Machine Learning, 35–41 (1998)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)
Fisher, D.: Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)
Talavera, L., Béjar, J.: Generality-Based Conceptual Clustering with Probabilistic Concepts. IEEE Transactions on Pattern Analysis & Machine Inteligence 23(2) (2001)
Bisson, G.: Conceptual Clustering in a First Order Logic Representation. In: European Conference on Artificial Intelligence, pp. 458–462 (1992)
De Raedt, L., Blockeel, H.: Using Logical Decision Trees for Clustering. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 133–140. Springer, Heidelberg (1997)
Emde, W.: Inductive learning of characteristic concept descriptions. In: Proc. 4th Intl Workshop on Inductive Logic Programming (ILP 1994) (1994)
Blockeel, H., De Raedt, L.: Top-down induction of first order logical decision trees. Artificial Intelligence 101, 285–297 (1998)
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th Intl. Conference on Machine Learning, pp. 55–63 (1998)
Estruch, V.: Bridging the gap between distance and generalisation:Symbolic learning in metric spaces. PhD thesis, DSIC-UPV (2008), http://www.dsic.upv.es/~vestruch/thesis.pdf
Funes, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Technical Report, DSIC (2008), http://www.dsic.upv.es/~flip/#Papers
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Funes, A.M., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J. (2008). Hierarchical Distance-Based Conceptual Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-87479-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87478-2
Online ISBN: 978-3-540-87479-9
eBook Packages: Computer ScienceComputer Science (R0)