Abstract
In this paper, an efficient Hybrid Hierarchical Agglomerative Clustering (HHAC) technique is proposed for effective clustering and prototype selection for pattern classification. It uses the characteristics of both partitional (an incremental scheme) and Hierarchical Agglomerative Clustering (HAC) schemes. Initially, an incremental, partitional clustering algorithm – leader is used for finding the subgroups/subclusters. It reduces the time and space requirements incurred in the formation of the subclusters using the conventional hierarchical agglomerative schemes or other methods. Further, only the subcluster representatives are merged to get a required number of clusters using a hierarchical agglomerative scheme which now requires less space and time when compared to that of using it on the entire training set. Thus, this hybrid scheme would be suitable for clustering large data sets and we can get a hierarchical structure consisting of clusters and subclusters. The subcluster representatives of a cluster can also handle its arbitrary/non-spherical shape. The experimental results (Classification Accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithm are promising.
Chapter PDF
References
Arnaud, D., Patrice, B., Gerard, V.L.: A Fuzzy Hybrid Hierarchical Clustering Method with a New Criterionable to Find the Optimal Partition. Fuzzy Sets and Systems 128(3), 323–338 (2002)
Harlow, T.J., Gogarten, J.P., Ragan, M.A.: A Hybrid Clustering Approach to Recognition of Protein Families in 114 Microbial Genomes. BMC Bioinformatics 5(45), 1–14 (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Mark, J.V., Katherine, S.P.: A New Algorithm for Hybrid Hierarchical Clustering with Visualization and the Bootstrap. Journal of Statistical Planning and Inference 117(2), 275–303 (2003)
Pujari, A.K.: Data Mining Techniques. Universities Press (India) Private Limited (2000)
Vijaya, P.A., Murty, M.N., Subramanian, D.K.: Leaders-Subleaders: An Efficient Hierarchical Clustering Algorithm for Large Data Sets. Pattern Recognition Letters 25, 503–511 (2004)
Vijaya, P.A., Murty, M.N., Subramanian, D.K.: Analysis of Leader based Clustering Algorithms for Pattern Classification. Accepted for publication in the Proceedings of the 2nd IICAI, December 2005, Pune, India (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vijaya, P.A., Murty, M.N., Subramanian, D.K. (2005). An Efficient Hybrid Hierarchical Agglomerative Clustering (HHAC) Technique for Partitioning Large Data Sets. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_92
Download citation
DOI: https://doi.org/10.1007/11590316_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)