Determining the Number of Components in Mixture Models for Hierarchical Data

Lukočienė, Olga; Vermunt, Jeroen K.

doi:10.1007/978-3-642-01044-6_22

Olga Lukočienė⁵ &
Jeroen K. Vermunt

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3113 Accesses
7 Citations

Abstract

Recently, various types of mixture models have been developed for data sets having a hierarchical or multilevel structure (see, e.g., Vermunt, Sociological Methodology 33:213–239, 2003; Computational Statistics and Data Analysis 51:5368–5376, 2007). Most of these models include finite mixture distributions at multiple levels of a hierarchical structure. In these multilevel mixture models, selection of the number of mixture component is more complex than in standard mixture models because one has to determine the number of mixture components at multiple levels.

In this study the performance of various model selection methods was investigated in the context of multilevel mixture models. We focus on determining the number of mixture components at the higher-level. We consider the information criteria BIC, AIC, and AIC3, and CAIC, as well as ICOMP and the validation log-likelihood. A specific difficulty that occurs in the application of BIC and CAIC in the context of multilevel models is that they contain the sample size as one of their terms and it is not clear which sample size should be used in their formula. This could be the number of groups, the number of individuals, or either the number of groups or number of individuals depending on whether one wishes to determine the number of components at the higher or at the lower level.

Our simulation study showed that when one wishes to determine the number of mixture components at the higher level, the most appropriate sample size for BIC and CAIC is the number of groups (higher-level units). Moreover, we found that BIC, CAIC and ICOMP detect very well the true number of mixture components when both the components’ separation and the group-level sample size are large enough. AIC performs best with low separation levels and small sizes at the group-level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55, 218–234.
Article MathSciNet Google Scholar
Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-fisher information matrix. In Opitz, O., Lausen, B., & Klar, R. (Eds.), Information and classification (PP. 218–234). Heidelberg: Springer.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
MATH MathSciNet Google Scholar
Dias, J. G. (2006). Model selection for the binary latent class model: A Monte Carlo simulation. In: V. Batagelj, H.-H. Bock, A. Ferligoj, & A. Žiberna (Eds.), Data science and classification (pp. 91–99). Berlin: Springer.
Chapter Google Scholar
Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis models. Cambridge: Cambridge University Press.
Book Google Scholar
Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Annals of Statistics, 20, 1350–1360.
Article MATH MathSciNet Google Scholar
Nylund, K. L., Muthen, B. O., & Asparouhov, T. (2003). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.
MathSciNet Google Scholar
Smyth, D. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 9, 63–72.
Article MathSciNet Google Scholar
Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33, 213–239.
Article Google Scholar
Vermunt, J. K. (2004). An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models. Statistical Neerlandica, 58, 220–233.
Article MATH MathSciNet Google Scholar
Vermunt, J. K. (2005). Mixed-effects logistic regression models for indirectly observed outcome variables. Multilevel Behavioral Research, 40, 281–301.
Article Google Scholar
Vermunt, J. K. (2007). A hierarchical mixture model for clustering three-way data sets. Computational Statistics and Data Analysis, 51, 5368–5376.
Article MATH MathSciNet Google Scholar
Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33–51.
Article MATH MathSciNet Google Scholar
Vermunt, J. K., & Magidson, J. (2008). LG-syntax user’s guide: Manual for latent GOLD 4.5 syntax module. Belmont, MA: Statistical Innovations.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Methodology and Statistics, Tilburg University, 90153, 5000 LE, Tilburg, The Netherlands
Olga Lukočienė

Authors

Olga Lukočienė
View author publications
You can also search for this author in PubMed Google Scholar
Jeroen K. Vermunt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olga Lukočienė .

Editor information

Editors and Affiliations

Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Andreas Fink
Dept. Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom
Berthold Lausen
Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Wilfried Seidel
FB 12 Mathematik und Informatik, Datenbionik AG, Universität Marburg, Hans-Meerwein-Straße, Marburg, 35032, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lukočienė, O., Vermunt, J.K. (2009). Determining the Number of Components in Mixture Models for Hierarchical Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-01044-6_22
Published: 31 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics