Variable Selection in Cluster Analysis: An Approach Based on a New Index

Morlini, Isabella; Zani, Sergio

doi:10.1007/978-3-642-28894-4_9

Isabella Morlini⁴ &
Sergio Zani⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3597 Accesses

Abstract

In cluster analysis, the inclusion of unnecessary variables may mask the true group structure. For the selection of the best subset of variables, we suggest the use of two overall indices. The first index is a distance between two hierarchical clusterings and the second one is a similarity index obtained as the complement to one of the previous distance. Both criteria can be used for measuring the similarity between clusterings obtained with different subsets of variables. An application with a real data set regarding the economic welfare of the Italian Regions shows the benefits gained with the suggested procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.
Article MathSciNet Google Scholar
Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. JASA, 78, 553–569.
MATH Google Scholar
Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.
Article MathSciNet Google Scholar
Fraiman, R., Justel, A., & Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. JASA, 103, 1294–1303.
MathSciNet MATH Google Scholar
Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subset of attributes. Journal of the Royal Statistical Society B, 66, 815–849.
Article MathSciNet MATH Google Scholar
Gnanadesikan, R., Kettering, J. R., & Tsao, S. L. (1995). Weighting and selection of variables for cluster analysis. Journal of Classification, 12, 113–136.
Article MATH Google Scholar
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Montanari, A., & Lizzani, L. (2001). A projection pursuit approach to variable selection. Computational Statistics and Data Analysis, 35, 463–473.
Article MathSciNet MATH Google Scholar
Raftery, A. E., & Dean, N. (2006). Variable selection for model based clustering. JASA, 101, 168–178.
MathSciNet MATH Google Scholar
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. JASA, 66, 846–850.
Google Scholar
Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.
Article MathSciNet MATH Google Scholar
Tadesse, M. G., Sha, N., & Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. JASA, 100, 602–617.
MathSciNet MATH Google Scholar
Warrens, M. J. (2008). On the equivalence of Cohen’s Kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183.
Article MathSciNet MATH Google Scholar
Zani, S. (1986). Some measures for the comparison of data matrices. In Proceedings of the XXXIII Meeting of the Italian Statistical Society (pp. 157–169), Bari, Italy.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, University of Modena and Reggio Emilia, Via Berengario 51, 41100, Modena, Italy
Isabella Morlini
Department of Economics, University of Parma, Via Kennedy 6, 43100, Parma, Italy
Sergio Zani

Authors

Isabella Morlini
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Zani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isabella Morlini .

Editor information

Editors and Affiliations

Department of Statistics, Università degli Studi di Firenze, Viale G.B. Morgagni 59, Firenze, 50134, Italy
Antonio Giusti
Fakultät für Informatik, und Mathematik, Universität Passau, Innstr. 33, Passau, 94030, Germany
Gunter Ritter
Sapienza", Department of Statistics, University of Rome "La, Piazzale Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morlini, I., Zani, S. (2013). Variable Selection in Cluster Analysis: An Approach Based on a New Index. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-28894-4_9
Published: 06 September 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics