Abstract
CURE algorithm is an efficient hierarchical clustering algorithm for large data sets. This paper presents an improved CURE algorithm, named ISE-RS-CURE. The algorithm adopts a sample extraction algorithm combined with statistical ideas, which can reasonably select sample points according to different data densities and can improve the representation of sample sets. When the sample set is extracted, the data set is divided at the same time, which can help to reduce the time consumption in the non-sample set allocation process. A selection strategy based on partition influence factor is proposed for the selection of representative points, which comprehensively considers the overall correlation between the data in the region where a representative point is located, so as to improve the rationality of the representative points. Experiments show that the improved CURE algorithm proposed in this paper can ensure the accuracy of the clustering results and can also improve the operating efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 3rd edn. China Machine Press, Beijing (2012)
Niu, Z.-H., Fan, J.-C., Liu, W.-H., Tang, L., Tang, S.: CDNASA: clustering data with noise and arbitrary shape. Int. J. Wirel. Mob. Comput. 11(2), 100–111 (2016)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM (1998)
Guha, S., Rastogi, R., Shim, K., et al.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)
Kang, W., Ye, D.: Study of CURE based clustering algorithm. In: 18th China Conference on Computer Technology and Applications (CACIS), vol. 1. Computer Technology and Application Progress, pp. 132–135. China University of Science and Technology Press, Hefei (2007)
Jie, S., Zhao, L., Yang, J., et al.: Hierarchical clustering algorithm based on partition. Comput. Eng. Appl. 43(31), 175–177 (2007)
Wu, H., Li, W., Jiang, M.: Modified CURE clustering algorithm based on entropy. Comput. Appl. Res. 34(08), 2303–2305 (2017)
Wang, Y., Wang, J., Chen, H., Xu, T., Sun, B.: An algorithm for approximate binary hierarchical clustering using representatives. Mini Micro Comput. Syst. 36(02), 215–219 (2015)
Fray, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Jia, R., Geng, J., Ning, Z., et al.: Fast clustering algorithm based on representative points. Comput. Eng. Appl. 46(33), 121–123+126 (2010)
Zhao, Y.: Research on user clustering algorithm based on CURE. Comput. Eng. Appl. 11(1), 457–465 (2012)
Shao, X., Wei, C.: Improved CURE algorithm and application of clustering for large-scale data. In: International Symposium on it in Medicine and Education, pp 305–308. IEEE (2012)
Shi, N., Zhang, J., Chu, X.: CURE algorithm-based inspection of duplicated records. Comput. Eng. 35(05), 56–58 (2009)
Lichman, M.: UCI machine learning repository [EB/OL] (2013). http://archive.ics.uci.edu/ml.2018/02/24
Pengli, L.U., Wang, Z.: Density-sensitive hierarchical clustering algorithm. Comput. Eng. Appl. 50(04), 190–195 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 IFIP International Federation for Information Processing
About this paper
Cite this paper
Cai, M., Liang, Y. (2018). An Improved CURE Algorithm. In: Shi, Z., Pennartz, C., Huang, T. (eds) Intelligence Science II. ICIS 2018. IFIP Advances in Information and Communication Technology, vol 539. Springer, Cham. https://doi.org/10.1007/978-3-030-01313-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-01313-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01312-7
Online ISBN: 978-3-030-01313-4
eBook Packages: Computer ScienceComputer Science (R0)