Efficient subspace clustering for higher dimensional data using fuzzy entropy

Palanisamy, C.; Selvan, S.

doi:10.1007/s11518-009-5097-y

Efficient subspace clustering for higher dimensional data using fuzzy entropy

Published: 03 March 2009

Volume 18, pages 95–110, (2009)
Cite this article

Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

C. Palanisamy¹ &
S. Selvan²

Abstract

In this paper we propose a novel method for identifying relevant subspaces using fuzzy entropy and perform clustering. This measure discriminates the real distribution better by using membership functions for measuring class match degrees. Hence the fuzzy entropy reflects more information in the actual distribution of patterns in the subspaces. We use a heuristic procedure based on the silhouette criterion to find the number of clusters. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. Empirical results have shown its favorable performance in comparison with several other clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Johannes, G., Dimitrios, G. & Prabhakar, R. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, 94–105
Aggarwal, C.C. & Yu, P. (2002). Redefining clustering for high-dimensional applications, IEEE Trans. Knowledge and Data Eng., 14 (2): 210–225
Article Google Scholar
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C. & Park J.S. (1999). Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Baumgartner, 61–72
Blake. (2005). UCI Learning Repository. Available via DIALOG. http://www.ics.uci.edu/mlearn/MLsummary
Chang, J.W. & Jin, D.S. (2002). A new cell-based clustering method for large, high dimensional data in data mining applications. In: Proceedings of the 2002 ACM Symposium on Applied Computing, 503–507
Cheng, C.H., Fu, A.W. & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 84–93
Arthur, D. & Vassilvitskii, S. (2006). How slow is the k-means Method? In: Proceedings of the 2006 Symposium on Computational Geometry (SoCG)
Friedman, J.H. & Meulman, J.J. (2002). Clustering objects on subsets of attributes. Available via DIALOG. http://citeseer.nj.nec.com/friedman02clustering.html
Hruschka, E.R. & Covoes, T.F. (2005). Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. Computational Intelligence for Modelling, Control and Automation, 1 (28–30): 32–38
Google Scholar
Kailing, K., Kriegel, H.P., Kroeger, P. & Wanka, S. (2003). Ranking interesting subspaces for clustering high dimensional data. In: Proc. of 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, 241–252
Kanungo, T., Mount, D.M., Netanyahu, N., Piatko, C., Silverman, R. & Wu, A.Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans: Pattern Analysis and Machine Intelligence, 24: 881–892
Article Google Scholar
Kaufman, L. & Rousseau, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley
Kosko, B. (1990). Fuzzy systems as universal approximators. In: Proc. of IEEE International Conf. Fuzzy Systems, 1153–1162
Lee, C.C. (1992). Fuzzy logic in control systems: fuzzy logic controller, parts i and ii. IEEE Transactions on Systems, Man, and Cybernetics, 20 (2): 404–435
Article Google Scholar
Liu, B., Xia, Y. & Yu, P.S. (2000). Clustering through decision tree construction. In: Proceedings of the Ninth International Conf. on Inform. and Knowl. Management, 20–29
Martinez, A.M. & Kak, A.C. (2001). PCA versus LDA. IEEE Trans: Pattern Analysis and Machine Intelligence, 23 (2): 228–233
Article Google Scholar
Parson, L., Haque, E. & Liu, H. (2004). Subspace clustering for high dimensional data: a review. SIGKDD Explorations, 6 (1): 90–105
Article Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K. & Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 418–427
Plant, C., Railing, C., Kriegel, K. & Kroger, P. (2004). Subspace selection for clustering high-dimensional data. In: Fourth IEEE International Conference on Data Mining (ICDM’04) 1 (4): 11–18
Google Scholar
Goil, S., Nagesh, H. & Choudhary, A. (1999). Mafia: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC TR-9906-010, Northwestern University
Woo, K.G. & Lee, J.H. (2002). FINDIT: a Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, Korea Advanced Institute of Science and Technology, Taejon, Korea
Google Scholar
Xiong, H., Wu, J. & Chen, J. (2006). K-means clustering versus validation measures: a data distribution perspective. In: Proc. of the 12th ACM SIGKDD
Yang, J., Wang, W., Wang, H. & Yu, P. (2002). δ-clusters: capturing subspace correlation in a large data set. In: 18th International Conference on Data Engineering, 517–528

Download references

Author information

Authors and Affiliations

Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam, TN, India
C. Palanisamy
Department of Computer Science, St. Peters Engineering College, Chennai, TN, India
S. Selvan

Authors

C. Palanisamy
View author publications
You can also search for this author in PubMed Google Scholar
S. Selvan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Palanisamy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palanisamy, C., Selvan, S. Efficient subspace clustering for higher dimensional data using fuzzy entropy. J. Syst. Sci. Syst. Eng. 18, 95–110 (2009). https://doi.org/10.1007/s11518-009-5097-y

Download citation

Published: 03 March 2009
Issue Date: March 2009
DOI: https://doi.org/10.1007/s11518-009-5097-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient subspace clustering for higher dimensional data using fuzzy entropy

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A Short Review on Different Clustering Techniques and Their Applications

Maximizing adjusted covariance: new supervised dimension reduction for classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient subspace clustering for higher dimensional data using fuzzy entropy

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A Short Review on Different Clustering Techniques and Their Applications

Maximizing adjusted covariance: new supervised dimension reduction for classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation