Skip to main content
Log in

General location model with factor analyzer covariance matrix structure and its applications

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

General location model (GLOM) is a well-known model for analyzing mixed data. In GLOM one decomposes the joint distribution of variables into conditional distribution of continuous variables given categorical outcomes and marginal distribution of categorical variables. The first version of GLOM assumes that the covariance matrices of continuous multivariate distributions across cells, which are obtained by different combination of categorical variables, are equal. In this paper, the GLOMs are considered in both cases of equality and unequality of these covariance matrices. Three covariance structures are used across cells: the same factor analyzer, factor analyzer with unequal specific variances matrices (in the general and parsimonious forms) and factor analyzers with common factor loadings. These structures are used for both modeling covariance structure and for reducing the number of parameters. The maximum likelihood estimates of parameters are computed via the EM algorithm. As an application for these models, we investigate the classification of continuous variables within cells. Based on these models, the classification is done for usual as well as for high dimensional data sets. Finally, for showing the applicability of the proposed models for classification, results from analyzing three real data sets are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Airoldi JP, Hoffmann RS (1984) Age variation in volves (Microtus californicus, M. ochrogaster) and its significance for systematic studies. Occasional papers of the Museum of Natural History, University of Kansas, Lawrence KS 111:1–45

  • Anderson JA, Pemberton JD (1985) The grouped continuous model for multivariate ordered categorical variables and covariate adjustment. Biometrics 41:875–885

    Article  MathSciNet  MATH  Google Scholar 

  • Baek J, McLachlan GJ (2008) Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Technical Report NI08018-SCH. Preprint Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge

  • Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1298–1309

    Article  Google Scholar 

  • Barnard J, McCulloch RE, Meng XL (2000) Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat Sin 10:1281–1311

    MathSciNet  MATH  Google Scholar 

  • Bartholomew DJ, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, 3rd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Belin TR, Hu MY, Young AS, Grusky O (1999) Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study. Stat Med 18:3123–3135

    Article  Google Scholar 

  • Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge

    MATH  Google Scholar 

  • Browne RP, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis of data with mixed type. J Stat Plann Inference 142:2976–2984

    Article  MathSciNet  MATH  Google Scholar 

  • Cai JH, Song XY, Lam KH, Ip HS (2011) A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput Stat Data Anal 55:2889–2907

    Article  MathSciNet  MATH  Google Scholar 

  • de Leon AR, Carrière KC (2007) General mixed-data model: extension of general location and grouped continuous models. Can J Stat 35:533–548

    Article  MathSciNet  MATH  Google Scholar 

  • de Leon AR, Carrière KC (2013) Analysis of mixed data: methods and applications. Chapman & Hall/CRC, London

    Book  MATH  Google Scholar 

  • de Leon AR, Soo A, Williamson T (2011) Classification with discrete and continuous variables via general mixed-data models. J Appl Stat 38:1021–1032

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  • Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3 (2012)

  • Fonseca JRS (2010) On the performance of information criteria in latent segment models. World Acad Sci Eng Technol 63:330–337

    Google Scholar 

  • Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann N Y Acad Sci 808:18–24

    Article  Google Scholar 

  • Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32:85–113

    Article  MathSciNet  MATH  Google Scholar 

  • Krzanowski WJ (1982) Mixtures of continuous and categorical variables in discriminant analysis: a hypothesis testing approach. Biometrics 38:991–1002

    Article  MathSciNet  MATH  Google Scholar 

  • Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Little RJA, Schluchter MD (1985) Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika 72:492–512

    Article  MathSciNet  MATH  Google Scholar 

  • Little RJ, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Liu C, Rubin DB (1998) Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data. Biometrika 85:673–688

    Article  MathSciNet  MATH  Google Scholar 

  • Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 14:41–67

    MathSciNet  MATH  Google Scholar 

  • Nguyen HT, Coomans D, Leermakers M, Boman J (1997) Multivariate statistical analysis of human exposure to trace elements from coal in Vietnam. in: SPRUCE IV, international conference on statistical aspects of health and the environment, Enschede, The Netherlands (1997)

  • Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32:448–465

    Article  MathSciNet  MATH  Google Scholar 

  • Peng Y, Little RJA, Raghunathan TE (2004) An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics 60:598–607

    Article  MathSciNet  MATH  Google Scholar 

  • Poon WY, Lee SY (1987) Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients. Psychometrika 52:409–430

    Article  MathSciNet  MATH  Google Scholar 

  • Punzo A, Ingrassia S (2013) On the use of the generalized linear exponential cluster-weighted model to asses local linear independence in bivariate data. QdS J Methodol Appl Stat 15:131–144

    Google Scholar 

  • Punzo A, Ingrassia S (2015) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Stat. doi:10.1007/s00180-015-0600-z

  • Rencher AC (1998) Multivariate statistical inference and applications. Wiley, New York

    MATH  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, New York

    Book  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Smyth C, Coomans D, Everingham Y (2006) Clustering noisy data in a reduced dimension space via multivariate regression trees. Pattern Recognit 39:424–431

    Article  MATH  Google Scholar 

  • Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7:5–40

    Article  MathSciNet  MATH  Google Scholar 

  • Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted t-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24:623–649

    Article  MathSciNet  MATH  Google Scholar 

  • Wu CFJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful for the helpful comments and valuable suggestions given by the referees which have greatly improved quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mojtaba Khazaei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amiri, L., Khazaei, M. & Ganjali, M. General location model with factor analyzer covariance matrix structure and its applications. Adv Data Anal Classif 11, 593–609 (2017). https://doi.org/10.1007/s11634-016-0258-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0258-6

Keywords

Mathematics Subject Classification

Navigation