Skip to main content

Variable Selection in Cluster Analysis: An Approach Based on a New Index

  • Conference paper
  • First Online:
Classification and Data Mining
  • 3597 Accesses

Abstract

In cluster analysis, the inclusion of unnecessary variables may mask the true group structure. For the selection of the best subset of variables, we suggest the use of two overall indices. The first index is a distance between two hierarchical clusterings and the second one is a similarity index obtained as the complement to one of the previous distance. Both criteria can be used for measuring the similarity between clusterings obtained with different subsets of variables. An application with a real data set regarding the economic welfare of the Italian Regions shows the benefits gained with the suggested procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.

    Article  MathSciNet  Google Scholar 

  • Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. JASA, 78, 553–569.

    MATH  Google Scholar 

  • Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.

    Article  MathSciNet  Google Scholar 

  • Fraiman, R., Justel, A., & Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. JASA, 103, 1294–1303.

    MathSciNet  MATH  Google Scholar 

  • Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subset of attributes. Journal of the Royal Statistical Society B, 66, 815–849.

    Article  MathSciNet  MATH  Google Scholar 

  • Gnanadesikan, R., Kettering, J. R., & Tsao, S. L. (1995). Weighting and selection of variables for cluster analysis. Journal of Classification, 12, 113–136.

    Article  MATH  Google Scholar 

  • Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Montanari, A., & Lizzani, L. (2001). A projection pursuit approach to variable selection. Computational Statistics and Data Analysis, 35, 463–473.

    Article  MathSciNet  MATH  Google Scholar 

  • Raftery, A. E., & Dean, N. (2006). Variable selection for model based clustering. JASA, 101, 168–178.

    MathSciNet  MATH  Google Scholar 

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. JASA, 66, 846–850.

    Google Scholar 

  • Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.

    Article  MathSciNet  MATH  Google Scholar 

  • Tadesse, M. G., Sha, N., & Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. JASA, 100, 602–617.

    MathSciNet  MATH  Google Scholar 

  • Warrens, M. J. (2008). On the equivalence of Cohen’s Kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183.

    Article  MathSciNet  MATH  Google Scholar 

  • Zani, S. (1986). Some measures for the comparison of data matrices. In Proceedings of the XXXIII Meeting of the Italian Statistical Society (pp. 157–169), Bari, Italy.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabella Morlini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Morlini, I., Zani, S. (2013). Variable Selection in Cluster Analysis: An Approach Based on a New Index. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_9

Download citation

Publish with us

Policies and ethics