Abstract
In cluster analysis, the inclusion of unnecessary variables may mask the true group structure. For the selection of the best subset of variables, we suggest the use of two overall indices. The first index is a distance between two hierarchical clusterings and the second one is a similarity index obtained as the complement to one of the previous distance. Both criteria can be used for measuring the similarity between clusterings obtained with different subsets of variables. An application with a real data set regarding the economic welfare of the Italian Regions shows the benefits gained with the suggested procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.
Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. JASA, 78, 553–569.
Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.
Fraiman, R., Justel, A., & Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. JASA, 103, 1294–1303.
Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subset of attributes. Journal of the Royal Statistical Society B, 66, 815–849.
Gnanadesikan, R., Kettering, J. R., & Tsao, S. L. (1995). Weighting and selection of variables for cluster analysis. Journal of Classification, 12, 113–136.
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Montanari, A., & Lizzani, L. (2001). A projection pursuit approach to variable selection. Computational Statistics and Data Analysis, 35, 463–473.
Raftery, A. E., & Dean, N. (2006). Variable selection for model based clustering. JASA, 101, 168–178.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. JASA, 66, 846–850.
Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.
Tadesse, M. G., Sha, N., & Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. JASA, 100, 602–617.
Warrens, M. J. (2008). On the equivalence of Cohen’s Kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183.
Zani, S. (1986). Some measures for the comparison of data matrices. In Proceedings of the XXXIII Meeting of the Italian Statistical Society (pp. 157–169), Bari, Italy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morlini, I., Zani, S. (2013). Variable Selection in Cluster Analysis: An Approach Based on a New Index. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-28894-4_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)