Abstract
General location model (GLOM) is a well-known model for analyzing mixed data. In GLOM one decomposes the joint distribution of variables into conditional distribution of continuous variables given categorical outcomes and marginal distribution of categorical variables. The first version of GLOM assumes that the covariance matrices of continuous multivariate distributions across cells, which are obtained by different combination of categorical variables, are equal. In this paper, the GLOMs are considered in both cases of equality and unequality of these covariance matrices. Three covariance structures are used across cells: the same factor analyzer, factor analyzer with unequal specific variances matrices (in the general and parsimonious forms) and factor analyzers with common factor loadings. These structures are used for both modeling covariance structure and for reducing the number of parameters. The maximum likelihood estimates of parameters are computed via the EM algorithm. As an application for these models, we investigate the classification of continuous variables within cells. Based on these models, the classification is done for usual as well as for high dimensional data sets. Finally, for showing the applicability of the proposed models for classification, results from analyzing three real data sets are presented.
Similar content being viewed by others
References
Airoldi JP, Hoffmann RS (1984) Age variation in volves (Microtus californicus, M. ochrogaster) and its significance for systematic studies. Occasional papers of the Museum of Natural History, University of Kansas, Lawrence KS 111:1–45
Anderson JA, Pemberton JD (1985) The grouped continuous model for multivariate ordered categorical variables and covariate adjustment. Biometrics 41:875–885
Baek J, McLachlan GJ (2008) Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Technical Report NI08018-SCH. Preprint Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge
Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1298–1309
Barnard J, McCulloch RE, Meng XL (2000) Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat Sin 10:1281–1311
Bartholomew DJ, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, 3rd edn. Wiley, New York
Belin TR, Hu MY, Young AS, Grusky O (1999) Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study. Stat Med 18:3123–3135
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge
Browne RP, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis of data with mixed type. J Stat Plann Inference 142:2976–2984
Cai JH, Song XY, Lam KH, Ip HS (2011) A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput Stat Data Anal 55:2889–2907
de Leon AR, Carrière KC (2007) General mixed-data model: extension of general location and grouped continuous models. Can J Stat 35:533–548
de Leon AR, Carrière KC (2013) Analysis of mixed data: methods and applications. Chapman & Hall/CRC, London
de Leon AR, Soo A, Williamson T (2011) Classification with discrete and continuous variables via general mixed-data models. J Appl Stat 38:1021–1032
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3 (2012)
Fonseca JRS (2010) On the performance of information criteria in latent segment models. World Acad Sci Eng Technol 63:330–337
Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann N Y Acad Sci 808:18–24
Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32:85–113
Krzanowski WJ (1982) Mixtures of continuous and categorical variables in discriminant analysis: a hypothesis testing approach. Biometrics 38:991–1002
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Little RJA, Schluchter MD (1985) Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika 72:492–512
Little RJ, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
Liu C, Rubin DB (1998) Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data. Biometrika 85:673–688
Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 14:41–67
Nguyen HT, Coomans D, Leermakers M, Boman J (1997) Multivariate statistical analysis of human exposure to trace elements from coal in Vietnam. in: SPRUCE IV, international conference on statistical aspects of health and the environment, Enschede, The Netherlands (1997)
Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32:448–465
Peng Y, Little RJA, Raghunathan TE (2004) An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics 60:598–607
Poon WY, Lee SY (1987) Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients. Psychometrika 52:409–430
Punzo A, Ingrassia S (2013) On the use of the generalized linear exponential cluster-weighted model to asses local linear independence in bivariate data. QdS J Methodol Appl Stat 15:131–144
Punzo A, Ingrassia S (2015) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Stat. doi:10.1007/s00180-015-0600-z
Rencher AC (1998) Multivariate statistical inference and applications. Wiley, New York
Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, New York
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Smyth C, Coomans D, Everingham Y (2006) Clustering noisy data in a reduced dimension space via multivariate regression trees. Pattern Recognit 39:424–431
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7:5–40
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted t-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24:623–649
Wu CFJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103
Acknowledgments
The authors are grateful for the helpful comments and valuable suggestions given by the referees which have greatly improved quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amiri, L., Khazaei, M. & Ganjali, M. General location model with factor analyzer covariance matrix structure and its applications. Adv Data Anal Classif 11, 593–609 (2017). https://doi.org/10.1007/s11634-016-0258-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0258-6