Smooth Tests of Fit for Gaussian Mixtures

Suesse, Thomas; Rayner, John; Thas, Olivier

doi:10.1007/978-3-662-44983-7_12

Thomas Suesse²¹,
John Rayner^21,22 &
Olivier Thas^21,23

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2904 Accesses

Abstract

Model based clustering and classification are often based on a finite mixture distribution. The most popular choice for the mixture component distribution is the Gaussian distribution (Fraley and Raftery, J Stat Softw 18(6):1–13, 2007). Many tests, for example those based on goodness of fit measures, focus on detecting the order of the mixture. However what is often neglected are diagnostic tests to confirm the distributional assumptions. This may lead to the cluster analysis having invalid conclusions.

Smooth tests (Rayner et al., Smooth tests of goodness of fit: using R, 2nd edn. Wiley, Singapore, 2009) can be used to test the distributional assumptions against the so-called general smooth alternatives in the sense of Neyman (Skandinavisk Aktuarietidskr 20:150–99, 1937). To test for a mixture distribution we present smooth tests that have the additional advantage that they permit the testing of sub-hypotheses using components. These test statistics are asymptotically chi-squared distributed. Results of the simulation study show that bootstrapping needs to be applied for small to medium sample sizes to maintain the P(type I error) at the nominal level and that the proposed tests have high power against various alternatives. Lastly the tests are illustrated on a data set on the average amount of precipitation in inches for each of 70 United States and Puerto Rico cities (Mcneil, Interactive data analysis. Wiley, New York, 1977).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, J., & Li, P. (2009). Hypothesis test for normal mixture models: The EM approach. The Annals of Statistics, 37, 2523–2542.
Article MATH MathSciNet Google Scholar
Fraley, C., & Raftery, A. E. (2007). Model-based methods of classification: Using the mclust software in chemometrics. Journal of Statistical Software, 18(6), 1–13. http://www.jstatsoft.org/.
Li, P., & Chen, J. (2010). Testing the order of a finite mixture. Journal of the American Statistical Association, 105(491), 1084–1092.
Article MathSciNet Google Scholar
Li, P., Chen, J., & Marriott, P. (2009). Non-finite fisher information and homogeneity: An EM approach. Biometrika, 96(2), 411–426.
Article MATH MathSciNet Google Scholar
Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767–778.
Article MathSciNet Google Scholar
Mcneil, D. R. (1977). Interactive data analysis. New York: Wiley.
Google Scholar
Neyman, J. (1937). Smooth test for goodness of fit. Skandinavisk Aktuarietidskr, 20, 150–99.
Google Scholar
Rayner, J. C. W., Thas, O., & Best, D. J. (2009). Smooth tests of goodness of fit: Using R (2nd ed.). Singapore: Wiley.
Book Google Scholar
Thas, O. (2010). Comparing distributions. New York: Springer.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, NSW, 2522, Australia
Thomas Suesse, John Rayner & Olivier Thas
School of Mathematical and Physical Sciences, University of Newcastle, Callaghan, NSW, 2308, Australia
John Rayner
Department of Applied Mathematics, Biometrics and Process Control, Ghent University, 9000, Gent, Belgium
Olivier Thas

Authors

Thomas Suesse
View author publications
You can also search for this author in PubMed Google Scholar
John Rayner
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Thas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Suesse .

Editor information

Editors and Affiliations

University of Essex, Colchester, United Kingdom
Berthold Lausen
University of Luxembourg, Walferdange, Luxembourg
Sabine Krolak-Schwerdt
University of Luxembourg, Walferdange, Luxembourg
Matthias Böhmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suesse, T., Rayner, J., Thas, O. (2015). Smooth Tests of Fit for Gaussian Mixtures. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-662-44983-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics