Mining Arbitrary Shaped Clusters and Outputting a High Quality Dendrogram

Huang, Hao; Wang, Song; Wu, Shuangke; Gao, Yunjun; Lu, Wei; He, Qinming; Ying, Shi

doi:10.1007/978-3-319-44403-1_10

Hao Huang¹⁵,
Song Wang¹⁵,
Shuangke Wu¹⁵,
Yunjun Gao¹⁶,
Wei Lu¹⁷,
Qinming He¹⁶ &
…
Shi Ying¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

938 Accesses

Abstract

Hierarchical clustering (HC for short) outputs a dendrogram that offers more topological information than flat clustering (e.g., k-means). However, the existing HC algorithms focus on either the quality of the dendrogram or the ability of mining arbitrary shaped clusters. To address the above two aspects simultaneously, we present HICMEN by adopting (1) the classic agglomerative clustering framework that can generate a complete dendrogram, and (2) a novel similarity measure based on mutual k-nearest neighbors to capture the connectivity of data points and help properly merge up each arbitrary shaped cluster piece by piece. More importantly, we prove that the similarity measure has a nice property called weak monotonicity, which guarantees the quality of the dendrogram generated by HICMEN. Extensive experimental results show that HICMEN is capable of mining arbitrary shaped clusters effectively, and can simultaneously output a high quality dendrogram.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ankerst, M.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)
Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38 (2003)
Google Scholar
Chaoji, V., Hasan, M.A., Salem, S., Zaki, M.J.: SPARCL: an efficient and effective shape-based clustering. Knowl. Inf. Syst. 21(2), 201–229 (2009)
Article Google Scholar
Chaoji, V., Li, G., Yildirim, H., Zaki, M.J.: ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification. In: SDM, pp. 295–306 (2011)
Google Scholar
Chen, Y.-A., Tripathi, L.P., Dessailly, B.H., Nyström-Persson, J., Ahmad, S., Mizuguchi, K.: Integrated pathway clusters with coherent biological themes for target prioritisation. Plos One 9(6), e99030 (2014)
Article Google Scholar
Correa, C.D., Lindstrom, P.: Locally-scaled spectral clustering using empty region graphs. In: KDD, pp. 1330–1338 (2012)
Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Article MathSciNet MATH Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM, pp. 47–58 (2003)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Article MATH Google Scholar
SIPU Clustering datasets. http://cs.joensuu.fi/sipu/datasets/
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: ICDE, pp. 512–521 (1999)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)
Article MATH Google Scholar
Houle, M.E.: The relevant-set correlation model for data clustering. In: SDM, pp. 775–786 (2008)
Google Scholar
Hu, T., Liu, C., Tang, Y., Sun, J., Song, H., Sung, S.Y.: High-dimensional clustering: a clique-based hypergraph partitioning frameworks. Knowl. Inf. Syst. 39(1), 61–88 (2014)
Article Google Scholar
Huang, H., Gao, Y., Chen, L., Li, R., Chiew, K., He, Q.: Browse with a social web directory. In: SIGIR, pp. 865–868 (2013)
Google Scholar
Huang, H., Gao, Y., Chiew, K., Chen, L., He, Q.: Towards effective and efficient mining of arbitrary shaped clusters. In: ICDE, pp. 28–39 (2014)
Google Scholar
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: hierarchical clustering using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Article Google Scholar
Li, J., Xia, Y., Shan, Z., Liu, Y.: Scalable constrained spectral clustering. IEEE Trans. Knowl. Data Eng. 27(2), 589–593 (2015)
Article Google Scholar
Mok, P.K., Huang, H.Q., Kwok, Y.L., Au, J.S.: A robust adaptive clustering analysis method for automatic identification of clusters. Pattern Recogn. 45(8), 3017–3033 (2012)
Article Google Scholar
Alex, R., Alessandro, L.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Article MathSciNet Google Scholar
Sokal, R.R., Rohlf, F.J.: The comparison of dendrograms by objective methods. Taxon 11(2), 33–40 (1962)
Article Google Scholar
Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf. Process. Manag. 22(6), 465–476 (1985)
Article Google Scholar
Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T.: Multitask spectral clustering by exploring intertask correlation. IEEE Trans. Cybern. 45(5), 1069–1080 (2015)
Article Google Scholar
Kim, Y., Shim, K., Kim, M.-S., Lee, J.S.: DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce. Inf. Syst. 42, 15–35 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by NSFC Grants (61502347, 61502504, 61522208, 61572376, 61472359, 61379033, 61373038, and 61364025), the Fundamental Research Funds for the Central Universities (2015XZZX005-07, 2015XZZX004-18, and 2042015kf0038), and the Research Funds for Introduced Talents of WHU.

Author information

Authors and Affiliations

State Key Laboratory of Software Engineering, Wuhan University, Wuhan, People’s Republic of China
Hao Huang, Song Wang, Shuangke Wu & Shi Ying
College of Computer Science, Zhejiang University, Hangzhou, People’s Republic of China
Yunjun Gao & Qinming He
Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, MOE, Beijing, People’s Republic of China
Wei Lu

Authors

Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuangke Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Qinming He
View author publications
You can also search for this author in PubMed Google Scholar
Shi Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Lu .

Editor information

Editors and Affiliations

Clausthal University of Technology , Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington , Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, H. et al. (2016). Mining Arbitrary Shaped Clusters and Outputting a High Quality Dendrogram. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-44403-1_10
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44402-4
Online ISBN: 978-3-319-44403-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics