Abstract
In this paper we propose a novel method for identifying relevant subspaces using fuzzy entropy and perform clustering. This measure discriminates the real distribution better by using membership functions for measuring class match degrees. Hence the fuzzy entropy reflects more information in the actual distribution of patterns in the subspaces. We use a heuristic procedure based on the silhouette criterion to find the number of clusters. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. Empirical results have shown its favorable performance in comparison with several other clustering algorithms.
Similar content being viewed by others
References
Agrawal, R., Johannes, G., Dimitrios, G. & Prabhakar, R. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, 94–105
Aggarwal, C.C. & Yu, P. (2002). Redefining clustering for high-dimensional applications, IEEE Trans. Knowledge and Data Eng., 14 (2): 210–225
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C. & Park J.S. (1999). Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Baumgartner, 61–72
Blake. (2005). UCI Learning Repository. Available via DIALOG. http://www.ics.uci.edu/mlearn/MLsummary
Chang, J.W. & Jin, D.S. (2002). A new cell-based clustering method for large, high dimensional data in data mining applications. In: Proceedings of the 2002 ACM Symposium on Applied Computing, 503–507
Cheng, C.H., Fu, A.W. & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 84–93
Arthur, D. & Vassilvitskii, S. (2006). How slow is the k-means Method? In: Proceedings of the 2006 Symposium on Computational Geometry (SoCG)
Friedman, J.H. & Meulman, J.J. (2002). Clustering objects on subsets of attributes. Available via DIALOG. http://citeseer.nj.nec.com/friedman02clustering.html
Hruschka, E.R. & Covoes, T.F. (2005). Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. Computational Intelligence for Modelling, Control and Automation, 1 (28–30): 32–38
Kailing, K., Kriegel, H.P., Kroeger, P. & Wanka, S. (2003). Ranking interesting subspaces for clustering high dimensional data. In: Proc. of 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, 241–252
Kanungo, T., Mount, D.M., Netanyahu, N., Piatko, C., Silverman, R. & Wu, A.Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans: Pattern Analysis and Machine Intelligence, 24: 881–892
Kaufman, L. & Rousseau, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley
Kosko, B. (1990). Fuzzy systems as universal approximators. In: Proc. of IEEE International Conf. Fuzzy Systems, 1153–1162
Lee, C.C. (1992). Fuzzy logic in control systems: fuzzy logic controller, parts i and ii. IEEE Transactions on Systems, Man, and Cybernetics, 20 (2): 404–435
Liu, B., Xia, Y. & Yu, P.S. (2000). Clustering through decision tree construction. In: Proceedings of the Ninth International Conf. on Inform. and Knowl. Management, 20–29
Martinez, A.M. & Kak, A.C. (2001). PCA versus LDA. IEEE Trans: Pattern Analysis and Machine Intelligence, 23 (2): 228–233
Parson, L., Haque, E. & Liu, H. (2004). Subspace clustering for high dimensional data: a review. SIGKDD Explorations, 6 (1): 90–105
Procopiuc, C.M., Jones, M., Agarwal, P.K. & Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 418–427
Plant, C., Railing, C., Kriegel, K. & Kroger, P. (2004). Subspace selection for clustering high-dimensional data. In: Fourth IEEE International Conference on Data Mining (ICDM’04) 1 (4): 11–18
Goil, S., Nagesh, H. & Choudhary, A. (1999). Mafia: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC TR-9906-010, Northwestern University
Woo, K.G. & Lee, J.H. (2002). FINDIT: a Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, Korea Advanced Institute of Science and Technology, Taejon, Korea
Xiong, H., Wu, J. & Chen, J. (2006). K-means clustering versus validation measures: a data distribution perspective. In: Proc. of the 12th ACM SIGKDD
Yang, J., Wang, W., Wang, H. & Yu, P. (2002). δ-clusters: capturing subspace correlation in a large data set. In: 18th International Conference on Data Engineering, 517–528
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Palanisamy, C., Selvan, S. Efficient subspace clustering for higher dimensional data using fuzzy entropy. J. Syst. Sci. Syst. Eng. 18, 95–110 (2009). https://doi.org/10.1007/s11518-009-5097-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11518-009-5097-y