Skip to main content

Mining Arbitrary Shaped Clusters and Outputting a High Quality Dendrogram

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

  • 938 Accesses

Abstract

Hierarchical clustering (HC for short) outputs a dendrogram that offers more topological information than flat clustering (e.g., k-means). However, the existing HC algorithms focus on either the quality of the dendrogram or the ability of mining arbitrary shaped clusters. To address the above two aspects simultaneously, we present HICMEN by adopting (1) the classic agglomerative clustering framework that can generate a complete dendrogram, and (2) a novel similarity measure based on mutual k-nearest neighbors to capture the connectivity of data points and help properly merge up each arbitrary shaped cluster piece by piece. More importantly, we prove that the similarity measure has a nice property called weak monotonicity, which guarantees the quality of the dendrogram generated by HICMEN. Extensive experimental results show that HICMEN is capable of mining arbitrary shaped clusters effectively, and can simultaneously output a high quality dendrogram.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ankerst, M.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)

    Google Scholar 

  2. Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38 (2003)

    Google Scholar 

  3. Chaoji, V., Hasan, M.A., Salem, S., Zaki, M.J.: SPARCL: an efficient and effective shape-based clustering. Knowl. Inf. Syst. 21(2), 201–229 (2009)

    Article  Google Scholar 

  4. Chaoji, V., Li, G., Yildirim, H., Zaki, M.J.: ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification. In: SDM, pp. 295–306 (2011)

    Google Scholar 

  5. Chen, Y.-A., Tripathi, L.P., Dessailly, B.H., Nyström-Persson, J., Ahmad, S., Mizuguchi, K.: Integrated pathway clusters with coherent biological themes for target prioritisation. Plos One 9(6), e99030 (2014)

    Article  Google Scholar 

  6. Correa, C.D., Lindstrom, P.: Locally-scaled spectral clustering using empty region graphs. In: KDD, pp. 1330–1338 (2012)

    Google Scholar 

  7. Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM, pp. 47–58 (2003)

    Google Scholar 

  9. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  10. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

    Google Scholar 

  11. Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)

    Article  MATH  Google Scholar 

  12. SIPU Clustering datasets. http://cs.joensuu.fi/sipu/datasets/

  13. Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: ICDE, pp. 512–521 (1999)

    Google Scholar 

  14. Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)

    Article  MATH  Google Scholar 

  15. Houle, M.E.: The relevant-set correlation model for data clustering. In: SDM, pp. 775–786 (2008)

    Google Scholar 

  16. Hu, T., Liu, C., Tang, Y., Sun, J., Song, H., Sung, S.Y.: High-dimensional clustering: a clique-based hypergraph partitioning frameworks. Knowl. Inf. Syst. 39(1), 61–88 (2014)

    Article  Google Scholar 

  17. Huang, H., Gao, Y., Chen, L., Li, R., Chiew, K., He, Q.: Browse with a social web directory. In: SIGIR, pp. 865–868 (2013)

    Google Scholar 

  18. Huang, H., Gao, Y., Chiew, K., Chen, L., He, Q.: Towards effective and efficient mining of arbitrary shaped clusters. In: ICDE, pp. 28–39 (2014)

    Google Scholar 

  19. Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: hierarchical clustering using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)

    Article  Google Scholar 

  20. Li, J., Xia, Y., Shan, Z., Liu, Y.: Scalable constrained spectral clustering. IEEE Trans. Knowl. Data Eng. 27(2), 589–593 (2015)

    Article  Google Scholar 

  21. Mok, P.K., Huang, H.Q., Kwok, Y.L., Au, J.S.: A robust adaptive clustering analysis method for automatic identification of clusters. Pattern Recogn. 45(8), 3017–3033 (2012)

    Article  Google Scholar 

  22. Alex, R., Alessandro, L.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  23. Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  24. Sokal, R.R., Rohlf, F.J.: The comparison of dendrograms by objective methods. Taxon 11(2), 33–40 (1962)

    Article  Google Scholar 

  25. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf. Process. Manag. 22(6), 465–476 (1985)

    Article  Google Scholar 

  26. Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T.: Multitask spectral clustering by exploring intertask correlation. IEEE Trans. Cybern. 45(5), 1069–1080 (2015)

    Article  Google Scholar 

  27. Kim, Y., Shim, K., Kim, M.-S., Lee, J.S.: DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce. Inf. Syst. 42, 15–35 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSFC Grants (61502347, 61502504, 61522208, 61572376, 61472359, 61379033, 61373038, and 61364025), the Fundamental Research Funds for the Central Universities (2015XZZX005-07, 2015XZZX004-18, and 2042015kf0038), and the Research Funds for Introduced Talents of WHU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Huang, H. et al. (2016). Mining Arbitrary Shaped Clusters and Outputting a High Quality Dendrogram. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44403-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44402-4

  • Online ISBN: 978-3-319-44403-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics