Skip to main content
Log in

Efficient subspace clustering for higher dimensional data using fuzzy entropy

  • Published:
Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Abstract

In this paper we propose a novel method for identifying relevant subspaces using fuzzy entropy and perform clustering. This measure discriminates the real distribution better by using membership functions for measuring class match degrees. Hence the fuzzy entropy reflects more information in the actual distribution of patterns in the subspaces. We use a heuristic procedure based on the silhouette criterion to find the number of clusters. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. Empirical results have shown its favorable performance in comparison with several other clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Johannes, G., Dimitrios, G. & Prabhakar, R. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, 94–105

  2. Aggarwal, C.C. & Yu, P. (2002). Redefining clustering for high-dimensional applications, IEEE Trans. Knowledge and Data Eng., 14 (2): 210–225

    Article  Google Scholar 

  3. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C. & Park J.S. (1999). Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Baumgartner, 61–72

  4. Blake. (2005). UCI Learning Repository. Available via DIALOG. http://www.ics.uci.edu/mlearn/MLsummary

  5. Chang, J.W. & Jin, D.S. (2002). A new cell-based clustering method for large, high dimensional data in data mining applications. In: Proceedings of the 2002 ACM Symposium on Applied Computing, 503–507

  6. Cheng, C.H., Fu, A.W. & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 84–93

  7. Arthur, D. & Vassilvitskii, S. (2006). How slow is the k-means Method? In: Proceedings of the 2006 Symposium on Computational Geometry (SoCG)

  8. Friedman, J.H. & Meulman, J.J. (2002). Clustering objects on subsets of attributes. Available via DIALOG. http://citeseer.nj.nec.com/friedman02clustering.html

  9. Hruschka, E.R. & Covoes, T.F. (2005). Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. Computational Intelligence for Modelling, Control and Automation, 1 (28–30): 32–38

    Google Scholar 

  10. Kailing, K., Kriegel, H.P., Kroeger, P. & Wanka, S. (2003). Ranking interesting subspaces for clustering high dimensional data. In: Proc. of 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, 241–252

  11. Kanungo, T., Mount, D.M., Netanyahu, N., Piatko, C., Silverman, R. & Wu, A.Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans: Pattern Analysis and Machine Intelligence, 24: 881–892

    Article  Google Scholar 

  12. Kaufman, L. & Rousseau, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley

  13. Kosko, B. (1990). Fuzzy systems as universal approximators. In: Proc. of IEEE International Conf. Fuzzy Systems, 1153–1162

  14. Lee, C.C. (1992). Fuzzy logic in control systems: fuzzy logic controller, parts i and ii. IEEE Transactions on Systems, Man, and Cybernetics, 20 (2): 404–435

    Article  Google Scholar 

  15. Liu, B., Xia, Y. & Yu, P.S. (2000). Clustering through decision tree construction. In: Proceedings of the Ninth International Conf. on Inform. and Knowl. Management, 20–29

  16. Martinez, A.M. & Kak, A.C. (2001). PCA versus LDA. IEEE Trans: Pattern Analysis and Machine Intelligence, 23 (2): 228–233

    Article  Google Scholar 

  17. Parson, L., Haque, E. & Liu, H. (2004). Subspace clustering for high dimensional data: a review. SIGKDD Explorations, 6 (1): 90–105

    Article  Google Scholar 

  18. Procopiuc, C.M., Jones, M., Agarwal, P.K. & Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 418–427

  19. Plant, C., Railing, C., Kriegel, K. & Kroger, P. (2004). Subspace selection for clustering high-dimensional data. In: Fourth IEEE International Conference on Data Mining (ICDM’04) 1 (4): 11–18

    Google Scholar 

  20. Goil, S., Nagesh, H. & Choudhary, A. (1999). Mafia: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC TR-9906-010, Northwestern University

  21. Woo, K.G. & Lee, J.H. (2002). FINDIT: a Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, Korea Advanced Institute of Science and Technology, Taejon, Korea

    Google Scholar 

  22. Xiong, H., Wu, J. & Chen, J. (2006). K-means clustering versus validation measures: a data distribution perspective. In: Proc. of the 12th ACM SIGKDD

  23. Yang, J., Wang, W., Wang, H. & Yu, P. (2002). δ-clusters: capturing subspace correlation in a large data set. In: 18th International Conference on Data Engineering, 517–528

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Palanisamy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palanisamy, C., Selvan, S. Efficient subspace clustering for higher dimensional data using fuzzy entropy. J. Syst. Sci. Syst. Eng. 18, 95–110 (2009). https://doi.org/10.1007/s11518-009-5097-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11518-009-5097-y

Keywords

Navigation