An efficient model order selection for PCA mixture model

https://doi.org/10.1016/S0167-8655(02)00379-3Get rights and content

Abstract

This paper proposes a fast and sub-optimal selection method of model order such as the number of mixture components and the number of PCA bases for the PCA mixture model, consisting of a combination of many PCAs. Once the model order is determined, the parameters of the model can be easily estimated by the expectation maximization (EM) learning using the decorrelatedness of feature data in the PCA transformed space. The conventional model order selection method takes a long processing time because it requires to perform the time-consuming EM learning over all possible model orders. We try to simplify the model order selection method as follows. First, the time-consuming EM learning over the training data set has been performed once for a given number of mixture components, with all PCA bases kept. Second, in virtue of ordering property of PCA bases, the evaluation step to measure the fitness of model selection criterion over the validation data set has been performed sequentially by pruning less significant PCA base one by one, starting from the most insignificant PCA base. A pair of the number of mixture components and PCA bases that satisfies the model selection criterion fully is selected as the optimal model order for the given problem. Simulation results of the synthetic data classification and a practical problem of alphabet recognition show that the proposed model selection method determines the model order appropriately and improves the classification and detection performances.

Introduction

Principal component analysis (PCA) (Jolliffe, 1986) is a well-known technique of multivariate linear data analysis. The central idea of PCA is to reduce the dimensionality of a data set while retaining as much as possible of the variation in the data set. Using this idea, PCA has been applied to many fields including data compression, image analysis, visualization, pattern recognition, regression, and time-series prediction.

Also, PCA can be used to estimate the data distribution in terms of a simple Gaussian probability density function because we can estimate the mean vector and the covariance matrix from the training data. However, it can be not effective to estimate the complicated data distribution with multi-modal probability density function using a single PCA. In order to overcome this limitation, we propose to use a PCA mixture model that consists of a combination of PCAs. The idea of the PCA mixture model is motivated by the mixture-of-experts that tries to model a nonlinear global density function by a combination or mixture of many simple local density functions (Jacobs et al., 1991; Jordan and Jacobs, 1994).

The PCA mixture model has been introduced by some researchers. Hinton et al. (1997) proposed PCA mixture model based on the reconstruction error. The parameters of the model were determined by expectation maximization (EM) algorithm. But, their approach has the limitation of the absence of a probability model for PCA, which requires the additional undetermined parameters. The EM algorithm does not maximize the true likelihood, but only pseudo-likelihood. The proposed model was applied to modeling the manifold of handwritten digits. Tipping and Bishop (1999) proposed a probabilistic PCA (PPCA) that was derived from the perspective of density estimation, and mixture model using PPCA with Gaussian error term. They formulated the estimation of PPCA mixture model using EM algorithm. The EM algorithm was proved to maximize the likelihood. The proposed model was applied to image compression and handwritten digit recognition.

A PCA mixture model is a simplified version of Tipping and Bishop (1999) model. It has no error term for each mixture component, simpler formulation for the density modeling, and an efficient model selection method. Usually, the model order for a specific application is appropriately given by the domain expert in advance. For practical applications, it is crucial to determine the accurate order for the PCA mixture model for the given problem. Once the model order is determined, the parameters of the model can be easily estimated by the EM learning using the decorrelatedness of feature data in the PCA transformed space. Some previous researches mentioned how to obtain the optimal orders for the PCA mixture model, such as the number of mixture components and the number of PCA bases (Bishop, 1999). It is a kind of Bayesian model selection method requiring the evaluation of evidence for every model structure.

Usually, it takes a long processing time to determine the optimal model order because it is required to perform the time-consuming EM learning procedure over the training data set for all possible combinations of the number of mixture components and the number of PCA bases and it needs to measure how much each model order fits to the model selection criterion over the validation data set. We propose a fast and sub-optimal method of model order selection for the PCA mixture model as follow. First, for a given number of mixture components, the EM learning over the training data set has been performed once to estimate the parameters such as mean, covariance, and posterior probabilities of each mixture component, with all PCA bases kept. Second, in virtue of ordering property of PCA bases, the degree of fitness to the model selection criterion such as classification error over the validation data set has been computed for a given number of PCA bases, where we discard less significant PCA base one by one, starting from the most insignificant PCA base. These two procedures continue until the number of mixture components reaches to the predetermined value, by increasing one mixture component at a time. Finally, a pair of the number of mixture component and the number of PCA base which results in the smallest classification error is chosen as the optimal model order of the PCA mixture model.

This paper is organized as follows. Section 2 describes the PCA mixture model and the EM learning algorithm for learning the PCA mixture model. Section 3 explains the proposed fast and sub-optimal model order selection method. Section 4 presents the simulation results of applying the proposed PCA mixture model to the synthetic data classification and the alphabet recognition. Finally, a conclusion is drawn.

Section snippets

Theoretical backgrounds

A PCA mixture model to be proposed here is the model to estimate the density. Nevertheless, the PCA mixture model is applied for the pattern classification problems in Section 4. Basically, its central idea comes from the combination of mixture models and PCA.

In a mixture model (Jacobs et al., 1991; Jordan and Jacobs, 1994), a class is partitioned into a number of clusters and its density function of the n-dimensional observed data x={x1,…,xn} is represented by a linear combination of component

Fast model order selection

The above EM learning is assumed that the model order such as the number of mixture components and the number of PCA bases must be given before starting the EM learning. However, we have no prior knowledge about the optimal model order for a given problem in advance. So, we need to perform the time-consuming EM learning and the evaluation of fitness to the model selection criterion for all combinations of the number of mixture components and the number of PCA bases and select the model order

Classification of synthetic data

We have simulated a classification problem using two different sets of synthetic data in order to make sure that the proposed model selection method for PCA mixture model is valid. First, we take the synthetic data consisting of 10,000 randomly generated samples with two-dimensional Gaussian distribution. They form two classes with five clusters and each cluster has 1000 samples. The whole data were randomly divided into three parts such as a training set of 4428, a validation set of 2253, and

Conclusion

We proposed the efficient model order selection method for PCA mixture model. The model orders such as the number of mixture components and the number PCA bases have been determined as follows. First, for a given number of mixture components, the parameters of each mixture component has been learned by the EM learning over the training data set with all PCA bases kept. Then, the goodness of model selection criterion such as posterior probability or reconstruction error over the validation data

Acknowledgements

The authors would like to thank the Ministry of Education of Korea for its financial support toward the Electrical and Computer Engineering Division at POSTECH through its BK21 program. This research was also supported by Brain Science and Engineering Research Program supported by the Ministry of Science and Technology of Korea, and by the project no. 1NH0030103 supported by Korea Research Foundation.

References (10)

  • C. Bishop

    Bayesian PCA

    Adv. Neural Inform. Proc. Sys.

    (1999)
  • P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc.: Ser.-B

    (1977)
  • ETL Character Database, Image Understanding Section, Electrotechnical Laboratory, 1-1-4, Umezono, Tsukuba, Ibaraki,...
  • G. Hinton et al.

    Modeling the manifolds of images of handwritten digits

    IEEE Trans. Neural Network

    (1997)
  • H. Hotelling

    Analysis of a complex statistical variables into principal components

    J. Educ. Psychol.

    (1933)
There are more references available in the full text version of this article.

Cited by (0)

View full text