Skip to main content

Abstract

Model based clustering and classification are often based on a finite mixture distribution. The most popular choice for the mixture component distribution is the Gaussian distribution (Fraley and Raftery, J Stat Softw 18(6):1–13, 2007). Many tests, for example those based on goodness of fit measures, focus on detecting the order of the mixture. However what is often neglected are diagnostic tests to confirm the distributional assumptions. This may lead to the cluster analysis having invalid conclusions.

Smooth tests (Rayner et al., Smooth tests of goodness of fit: using R, 2nd edn. Wiley, Singapore, 2009) can be used to test the distributional assumptions against the so-called general smooth alternatives in the sense of Neyman (Skandinavisk Aktuarietidskr 20:150–99, 1937). To test for a mixture distribution we present smooth tests that have the additional advantage that they permit the testing of sub-hypotheses using components. These test statistics are asymptotically chi-squared distributed. Results of the simulation study show that bootstrapping needs to be applied for small to medium sample sizes to maintain the P(type I error) at the nominal level and that the proposed tests have high power against various alternatives. Lastly the tests are illustrated on a data set on the average amount of precipitation in inches for each of 70 United States and Puerto Rico cities (Mcneil, Interactive data analysis. Wiley, New York, 1977).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Chen, J., & Li, P. (2009). Hypothesis test for normal mixture models: The EM approach. The Annals of Statistics, 37, 2523–2542.

    Article  MATH  MathSciNet  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2007). Model-based methods of classification: Using the mclust software in chemometrics. Journal of Statistical Software, 18(6), 1–13. http://www.jstatsoft.org/.

  • Li, P., & Chen, J. (2010). Testing the order of a finite mixture. Journal of the American Statistical Association, 105(491), 1084–1092.

    Article  MathSciNet  Google Scholar 

  • Li, P., Chen, J., & Marriott, P. (2009). Non-finite fisher information and homogeneity: An EM approach. Biometrika, 96(2), 411–426.

    Article  MATH  MathSciNet  Google Scholar 

  • Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767–778.

    Article  MathSciNet  Google Scholar 

  • Mcneil, D. R. (1977). Interactive data analysis. New York: Wiley.

    Google Scholar 

  • Neyman, J. (1937). Smooth test for goodness of fit. Skandinavisk Aktuarietidskr, 20, 150–99.

    Google Scholar 

  • Rayner, J. C. W., Thas, O., & Best, D. J. (2009). Smooth tests of goodness of fit: Using R (2nd ed.). Singapore: Wiley.

    Book  Google Scholar 

  • Thas, O. (2010). Comparing distributions. New York: Springer.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Suesse .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suesse, T., Rayner, J., Thas, O. (2015). Smooth Tests of Fit for Gaussian Mixtures. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_12

Download citation

Publish with us

Policies and ethics