Abstract
Recently, various types of mixture models have been developed for data sets having a hierarchical or multilevel structure (see, e.g., Vermunt, Sociological Methodology 33:213–239, 2003; Computational Statistics and Data Analysis 51:5368–5376, 2007). Most of these models include finite mixture distributions at multiple levels of a hierarchical structure. In these multilevel mixture models, selection of the number of mixture component is more complex than in standard mixture models because one has to determine the number of mixture components at multiple levels.
In this study the performance of various model selection methods was investigated in the context of multilevel mixture models. We focus on determining the number of mixture components at the higher-level. We consider the information criteria BIC, AIC, and AIC3, and CAIC, as well as ICOMP and the validation log-likelihood. A specific difficulty that occurs in the application of BIC and CAIC in the context of multilevel models is that they contain the sample size as one of their terms and it is not clear which sample size should be used in their formula. This could be the number of groups, the number of individuals, or either the number of groups or number of individuals depending on whether one wishes to determine the number of components at the higher or at the lower level.
Our simulation study showed that when one wishes to determine the number of mixture components at the higher level, the most appropriate sample size for BIC and CAIC is the number of groups (higher-level units). Moreover, we found that BIC, CAIC and ICOMP detect very well the true number of mixture components when both the components’ separation and the group-level sample size are large enough. AIC performs best with low separation levels and small sizes at the group-level.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55, 218–234.
Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-fisher information matrix. In Opitz, O., Lausen, B., & Klar, R. (Eds.), Information and classification (PP. 218–234). Heidelberg: Springer.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
Dias, J. G. (2006). Model selection for the binary latent class model: A Monte Carlo simulation. In: V. Batagelj, H.-H. Bock, A. Ferligoj, & A. Žiberna (Eds.), Data science and classification (pp. 91–99). Berlin: Springer.
Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis models. Cambridge: Cambridge University Press.
Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Annals of Statistics, 20, 1350–1360.
Nylund, K. L., Muthen, B. O., & Asparouhov, T. (2003). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.
Smyth, D. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 9, 63–72.
Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33, 213–239.
Vermunt, J. K. (2004). An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models. Statistical Neerlandica, 58, 220–233.
Vermunt, J. K. (2005). Mixed-effects logistic regression models for indirectly observed outcome variables. Multilevel Behavioral Research, 40, 281–301.
Vermunt, J. K. (2007). A hierarchical mixture model for clustering three-way data sets. Computational Statistics and Data Analysis, 51, 5368–5376.
Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33–51.
Vermunt, J. K., & Magidson, J. (2008). LG-syntax user’s guide: Manual for latent GOLD 4.5 syntax module. Belmont, MA: Statistical Innovations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lukočienė, O., Vermunt, J.K. (2009). Determining the Number of Components in Mixture Models for Hierarchical Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)