Skip to main content

Determining the Number of Components in Mixture Models for Hierarchical Data

  • Conference paper
  • First Online:
Advances in Data Analysis, Data Handling and Business Intelligence

Abstract

Recently, various types of mixture models have been developed for data sets having a hierarchical or multilevel structure (see, e.g., Vermunt, Sociological Methodology 33:213–239, 2003; Computational Statistics and Data Analysis 51:5368–5376, 2007). Most of these models include finite mixture distributions at multiple levels of a hierarchical structure. In these multilevel mixture models, selection of the number of mixture component is more complex than in standard mixture models because one has to determine the number of mixture components at multiple levels.

In this study the performance of various model selection methods was investigated in the context of multilevel mixture models. We focus on determining the number of mixture components at the higher-level. We consider the information criteria BIC, AIC, and AIC3, and CAIC, as well as ICOMP and the validation log-likelihood. A specific difficulty that occurs in the application of BIC and CAIC in the context of multilevel models is that they contain the sample size as one of their terms and it is not clear which sample size should be used in their formula. This could be the number of groups, the number of individuals, or either the number of groups or number of individuals depending on whether one wishes to determine the number of components at the higher or at the lower level.

Our simulation study showed that when one wishes to determine the number of mixture components at the higher level, the most appropriate sample size for BIC and CAIC is the number of groups (higher-level units). Moreover, we found that BIC, CAIC and ICOMP detect very well the true number of mixture components when both the components’ separation and the group-level sample size are large enough. AIC performs best with low separation levels and small sizes at the group-level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55, 218–234.

    Article  MathSciNet  Google Scholar 

  • Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-fisher information matrix. In Opitz, O., Lausen, B., & Klar, R. (Eds.), Information and classification (PP. 218–234). Heidelberg: Springer.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.

    MATH  MathSciNet  Google Scholar 

  • Dias, J. G. (2006). Model selection for the binary latent class model: A Monte Carlo simulation. In: V. Batagelj, H.-H. Bock, A. Ferligoj, & A. Žiberna (Eds.), Data science and classification (pp. 91–99). Berlin: Springer.

    Chapter  Google Scholar 

  • Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis models. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Annals of Statistics, 20, 1350–1360.

    Article  MATH  MathSciNet  Google Scholar 

  • Nylund, K. L., Muthen, B. O., & Asparouhov, T. (2003). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.

    MathSciNet  Google Scholar 

  • Smyth, D. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 9, 63–72.

    Article  MathSciNet  Google Scholar 

  • Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33, 213–239.

    Article  Google Scholar 

  • Vermunt, J. K. (2004). An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models. Statistical Neerlandica, 58, 220–233.

    Article  MATH  MathSciNet  Google Scholar 

  • Vermunt, J. K. (2005). Mixed-effects logistic regression models for indirectly observed outcome variables. Multilevel Behavioral Research, 40, 281–301.

    Article  Google Scholar 

  • Vermunt, J. K. (2007). A hierarchical mixture model for clustering three-way data sets. Computational Statistics and Data Analysis, 51, 5368–5376.

    Article  MATH  MathSciNet  Google Scholar 

  • Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33–51.

    Article  MATH  MathSciNet  Google Scholar 

  • Vermunt, J. K., & Magidson, J. (2008). LG-syntax user’s guide: Manual for latent GOLD 4.5 syntax module. Belmont, MA: Statistical Innovations.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Lukočienė .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lukočienė, O., Vermunt, J.K. (2009). Determining the Number of Components in Mixture Models for Hierarchical Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_22

Download citation

Publish with us

Policies and ethics