Avalanche: A Hierarchical, Divisive Clustering Algorithm

Amalaman, Paul K.; Eick, Christoph F.

doi:10.1007/978-3-319-21024-7_20

Paul K. Amalaman⁵ &
Christoph F. Eick⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3119 Accesses

Abstract

Hierarchical clustering has been successfully used in many applications, such as bioinformatics and social sciences. In this paper, we introduce Avalanche, a new top-down hierarchical clustering approach that takes a dissimilarity matrix as its input. Such a tool can be used for applications where the dataset is partitioned based on pairwise distances among the examples, such as taxonomy generation tools and molecular biology applications in which dissimilarity among gene sequences are used as inputs — as opposed to flat file attribute/value pair datasets. The proposed algorithm uses local as well as global information to recursively split data associated with a tree node into two sub-nodes until some predefined termination condition is met. To split a node, initially the example that is furthest away from the other examples — the anti-medoid — is assigned to right sub-node and then additional examples are progressively assigned to this node which are nearest neighbors of the previously added example as long as a given objective function improves. Experimental evaluations done with artificial and real world datasets show that the new approach has improved speed, and obtained comparable clustering results as the well-known UPGMA algorithm on all datasets used in the experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.-Y., Melhado, I., Sham, P.C.: Clustag: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21(8), 1735–1736 (2005)
Article Google Scholar
Bien, J., Tibshirani, R.: Hierarchical clustering with prototypes via minimax linkage. J. Am. Stat. Assoc. 106, 1075–1084 (2011)
Article MATH MathSciNet Google Scholar
Boley, D.L.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)
Article Google Scholar
Chitta, R., Narasimha Murty, M.: Two-level k-means clustering algorithm for k–ψψ relationship establishment and linear-time classification. Pattern Recogn. 43(3), 796–804 (2010)
Article MATH Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. Br. Comput. Soc. 20(4), 364–366 (1977)
MATH MathSciNet Google Scholar
Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21, 768–780 (1965)
Google Scholar
Gose, E., Johnsonbaugh, R., Jost, S.: Pattern Recognition & Image Analysis. Prentice-Hall, New York (1996)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)
MATH Google Scholar
Everitt, B., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
MATH Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advance reference series. Prentice-Hall, Upper Saddle River (1988)
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Murugesan, K., Zhang, J.: Hybrid bisect K-means clustering algorithm. In: 2011 Second International Conference on Business Computing and Global Informatization, pp. 216–219
Google Scholar
Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)
Article Google Scholar
Selim, S.Z., Ismail, M.A.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–86 (1984)
Article MATH Google Scholar
Savaresi, S.M., Boley, D.L., Bittanti, S., Gazzaniga, G.: Choosing the cluster to split in bisecting divisive clustering algorithms. In: SIAM International Conference on Data Mining (2002)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V. A comparison of document clustering techniques. In: Proceedings of World Text Mining Conference, KDD 2000, Boston (2000)
Google Scholar
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. Br. Comput. Soc. 16(1), 30–34 (1973)
MathSciNet Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Boston (2005)
Google Scholar
Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Article Google Scholar
Mertens, S.: Computational the easiest hard problem. In: Percus, A., Istrate, G., Moore, C. (eds.) Complexity and Statistical Physics. Oxford University Press, Oxford (2006)
Google Scholar
The Joint Genome Institute: https://img.jgi.doe.gov/cgi-bin/w/main.cgi (2015)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, Houston, TX, 77204-3010, USA
Paul K. Amalaman & Christoph F. Eick

Authors

Paul K. Amalaman
View author publications
You can also search for this author in PubMed Google Scholar
Christoph F. Eick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul K. Amalaman .

Editor information

Editors and Affiliations

IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amalaman, P.K., Eick, C.F. (2015). Avalanche: A Hierarchical, Divisive Clustering Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-21024-7_20
Published: 01 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21023-0
Online ISBN: 978-3-319-21024-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics