Skip to main content

Avalanche: A Hierarchical, Divisive Clustering Algorithm

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

  • 3119 Accesses

Abstract

Hierarchical clustering has been successfully used in many applications, such as bioinformatics and social sciences. In this paper, we introduce Avalanche, a new top-down hierarchical clustering approach that takes a dissimilarity matrix as its input. Such a tool can be used for applications where the dataset is partitioned based on pairwise distances among the examples, such as taxonomy generation tools and molecular biology applications in which dissimilarity among gene sequences are used as inputs — as opposed to flat file attribute/value pair datasets. The proposed algorithm uses local as well as global information to recursively split data associated with a tree node into two sub-nodes until some predefined termination condition is met. To split a node, initially the example that is furthest away from the other examples — the anti-medoid — is assigned to right sub-node and then additional examples are progressively assigned to this node which are nearest neighbors of the previously added example as long as a given objective function improves. Experimental evaluations done with artificial and real world datasets show that the new approach has improved speed, and obtained comparable clustering results as the well-known UPGMA algorithm on all datasets used in the experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.-Y., Melhado, I., Sham, P.C.: Clustag: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21(8), 1735–1736 (2005)

    Article  Google Scholar 

  2. Bien, J., Tibshirani, R.: Hierarchical clustering with prototypes via minimax linkage. J. Am. Stat. Assoc. 106, 1075–1084 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  3. Boley, D.L.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)

    Article  Google Scholar 

  4. Chitta, R., Narasimha Murty, M.: Two-level k-means clustering algorithm for k–ψψ relationship establishment and linear-time classification. Pattern Recogn. 43(3), 796–804 (2010)

    Article  MATH  Google Scholar 

  5. Defays, D.: An efficient algorithm for a complete link method. Comput. J. Br. Comput. Soc. 20(4), 364–366 (1977)

    MATH  MathSciNet  Google Scholar 

  6. Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21, 768–780 (1965)

    Google Scholar 

  7. Gose, E., Johnsonbaugh, R., Jost, S.: Pattern Recognition & Image Analysis. Prentice-Hall, New York (1996)

    Google Scholar 

  8. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)

    MATH  Google Scholar 

  9. Everitt, B., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

    MATH  Google Scholar 

  10. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advance reference series. Prentice-Hall, Upper Saddle River (1988)

    MATH  Google Scholar 

  11. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  12. Murugesan, K., Zhang, J.: Hybrid bisect K-means clustering algorithm. In: 2011 Second International Conference on Business Computing and Global Informatization, pp. 216–219

    Google Scholar 

  13. Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)

    Article  Google Scholar 

  14. Selim, S.Z., Ismail, M.A.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–86 (1984)

    Article  MATH  Google Scholar 

  15. Savaresi, S.M., Boley, D.L., Bittanti, S., Gazzaniga, G.: Choosing the cluster to split in bisecting divisive clustering algorithms. In: SIAM International Conference on Data Mining (2002)

    Google Scholar 

  16. Steinbach, M., Karypis, G., Kumar, V. A comparison of document clustering techniques. In: Proceedings of World Text Mining Conference, KDD 2000, Boston (2000)

    Google Scholar 

  17. Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. Br. Comput. Soc. 16(1), 30–34 (1973)

    MathSciNet  Google Scholar 

  18. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Boston (2005)

    Google Scholar 

  19. Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)

    Article  Google Scholar 

  20. Mertens, S.: Computational the easiest hard problem. In: Percus, A., Istrate, G., Moore, C. (eds.) Complexity and Statistical Physics. Oxford University Press, Oxford (2006)

    Google Scholar 

  21. The Joint Genome Institute: https://img.jgi.doe.gov/cgi-bin/w/main.cgi (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul K. Amalaman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Amalaman, P.K., Eick, C.F. (2015). Avalanche: A Hierarchical, Divisive Clustering Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21024-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21023-0

  • Online ISBN: 978-3-319-21024-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics