Abstract
In this work we analyse the relationship between distance and generalisation operators for real numbers, nominal data and tuples in the context of hierarchical distance-based conceptual clustering (HDCC). HDCC is a general approach to conceptual clustering that extends the traditional algorithm for hierarchical clustering by producing conceptual generalisations of the discovered clusters. This makes it possible to combine the flexibility of changing distances for several clustering problems and the advantage of having concepts which are crucial for tasks as summarisation and descriptive data mining in general. In this work we propose a set of generalisation operators and distances for the data types mentioned before and we analyse the properties by them satisfied on the basis of three different levels of agreement between the clustering hierarchy obtained from the linkage distance and the hierarchy obtained by using generalisation operators.
This work has been partially supported by the EU (FEDER) and the Spanish MEC/MICINN under grant TIN2007-68093-C02 and the Spanish project "Agreement Technologies" (Consolider Ingenio CSD2007-00022). A. Funes was supported by a grant from the Alfa Lernet project and the UNSL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Funes, A., Ferri, C., Hernández-Orallo, J., Ramirez-Quintana, M.J.: Hierarchical Distance-based Conceptual Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 349–364. Springer, Heidelberg (2008)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Survey 31(3), 264–323 (1999)
Berkhin, P.: A Survey of Clustering Data Mining Techniques. In: Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)
Stanfill, A., Waltz, D.: Toward memory-based reasoning. Comm. of the ACM 29, 1213–1228 (1986)
Black, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases (1998)
Estruch, V.: Bridging the gap between distance and generalisation: Symbolic learning in metric spaces. PhD thesis, DSIC-UPV (2008), http://www.dsic.upv.es/~vestruch/thesis.pdf
Fisher, D.: Knowledge acquisition via incremental conceptual clustering. In: Machine Learning, pp. 139–172 (1987)
Michalski, R.S.: Knowledge Acquisition Through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts. Policy Analysis and Information Systems 4(3), 219–244 (1980)
Michalski, R.S., Stepp, R.E.: Learning from Observation: Conceptual Clustering. In: Michalski, et al. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 331–363. TIOGA Publishing Co. (1983)
Talavera, L., Béjar, J.: Generality-Based Conceptual Clustering with Probabilistic Concepts. IEEE Transactions on Pattern Analysis & Machine Intelligence 23(2) (2001)
Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eurgenics, Part II 7, 179–188 (1936)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Sym. on Math. Statistics & Probability, pp. 281–297. Univ. of California Press (1967)
Cover, T.M., Hart, P.E.: Nearest neighbour pattern classification. IEEE Trans. Info. Theory IT-13, 21–27 (1967)
Funes, A.: Agrupamiento Conceptual Jerárquico Basado en Distancias, Definición e Instanciación para el Caso Proposicional. Master thesis. DSIC-UPV (2008), http://www.dsic.upv.es/~afunes/masterThesis.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Funes, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J. (2009). An Instantiation of Hierarchical Distance-Based Conceptual Clustering for Propositional Learning. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_63
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)