Abstract
The Expectation-Maximization (EM) algorithm is a broadly applicable approach to the iterative computation of maximum likelihood estimates in a wide variety of incomplete-data problems. The EM algorithm has a number of desirable properties, such as its numerical stability, reliable global convergence, and simplicity of implementation. There are, however, two main drawbacks of the basic EM algorithm – lack of an in-built procedure to compute the covariance matrix of the parameter estimates and slow convergence. In addition, some complex problems lead to intractable Expectation-steps and Maximization-steps. The first edition of the book chapter published in 2004 covered the basic theoretical framework of the EM algorithm and discussed further extensions of the EM algorithm to handle complex problems. The second edition attempts to capture advanced developments in EM methodology in recent years, especially in its applications to the related fields of biomedical and health sciences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker, S.G.: A simple method for computing the observed information matrix when using the EM algorithm with categorical data. J. Comput. Graph. Stat. 1, 63–76 (1992)
Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture models. Comput. Stat. 12, 1–17 (1997)
Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Theory and Practice. Springer, New York (2007)
Booth, J.G., Hobert, J.P.: Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. Roy. Stat. Soc. B 61, 265–285 (1999)
Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)
Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46, 167–174 (1992)
Chen, K., Xu, L., Chi, H.: Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw. 12, 1229–1252 (1999)
Chernick, M.R.: Bootstrap Methods: A Guide for Practitioners and Researchers. Wiley, Hoboken, New Jersey (2008)
Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton, New Jersey (1946)
Csiszár, I., Tusnády, G.: Information geometry and alternating minimization procedure. In: Dudewicz, E.J., Plachky, D., Sen, P.K. (eds.) Recent Results in Estimation Theory and Related Topics, pp. 205–237. R. Oldenbourg, Munich (1984)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39, 1–38 (1977)
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, London (1993)
Fessler, J.A., Hero, A.O.: Space-alternating generalized expectation-maximization algorithm. IEEE Trans. Signal. Process. 42, 2664–2677 (1994)
Flury, B., Zoppé, A.: Exercises in EM. Am. Stat. 54, 207–209 (2000)
Gamerman, D., Lopes, H.F.: Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd edn. Chapman & Hall/CRC, Boca Raton, FL (2006)
Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990)
Hathaway, R.J.: Another interpretation of the EM algorithm for mixture distributions. Stat. Probab. Lett. 4, 53–56 (1986)
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3, 79–87 (1991)
Jamshidian, M., Jennrich, R.I.: Standard errors for EM estimation. J. Roy. Stat. Soc. B 62, 257–270 (2000)
Jepson, A.D., Fleet, D.J., EI-Maraghi, T.F.: Robust online appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1296–1311 (2003)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6, 181–214 (1994)
Jordan, M.I., Xu, L.: Convergence results for the EM approach to mixtures of experts architectures. Neural Netw. 8, 1409–1431 (1995)
Lai, S.H., Fang, M.: An adaptive window width/center adjustment system with online training capabilities for MR images. Artif. Intell. Med. 33, 89–101 (2005)
Lee, M.L.T., Kuo, F.C., Whitmore, G.A., Sklar, J.: Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97, 9834–9838 (2000)
Levine, R., Fan, J.J.: An automated (Markov chain) Monte Carlo EM algorithm. J. Stat. Comput. Simulat. 74, 349–359 (2004)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994)
Liu, C., Rubin, D.B.: Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data. Stat. Sin. 8, 729–747 (1998)
Liu, C., Rubin, D.B., Wu, Y.N.: Parameter expansion to accelerate EM: the PX–EM algorithm. Biometrika 85, 755–770 (1998)
Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. Roy. Stat. Soc. B 44, 226–233 (1982)
McCullagh, P.A., Nelder, J.: Generalized Linear Models, 2nd edn. Chapman & Hall, London (1989)
McCulloch, C.E.: Maximum likelihood algorithms for generalized linear mixed models. J. Am. Stat. Assoc. 92, 162–170 (1997)
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)
McLachlan, G.J., Bean, R.W., Peel, D.: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18, 413–422 (2002)
McLachlan, G.J., Do, K.A., Ambroise, C.: Analyzing Microarray Gene Expression Data. Wiley, New York (2004)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Hoboken, New Jersey (2008)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Meilijson, I.: A fast improvement of the EM algorithm in its own terms. J. Roy. Stat. Soc. B 51, 127–138 (1989)
Meng, X.L.: On the rate of convergence of the ECM algorithm. Ann. Stat. 22, 326–339 (1994)
Meng, X.L., Rubin, D.B.: Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J. Am. Stat. Assoc. 86, 899–909 (1991)
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)
Meng, X.L., van Dyk, D.: The EM algorithm – an old folk song sung to a fast new tune. J. Roy. Stat. Soc. B 59, 511–567 (1997)
Moore, A.W.: Very fast EM-based mixture model clustering using multiresolution kd-trees. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems 11, pp. 543–549. MIT Press, MA (1999)
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer, Dordrecht (1998)
Nettleton, D.: Convergence properties of the EM algorithm in constrained parameter spaces. Can. J. Stat. 27, 639–648 (1999)
Ng, S.K., McLachlan, G.J.: On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Stat. Comput. 13, 45–55 (2003)
Ng, S.K., McLachlan, G.J (2004a). Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification. IEEE Trans. Neural Netw. 15, 738–749.
Ng, S.K., McLachlan, G.J (2004b). Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images. Pattern Recogn. 37, 1573–1589.
Ng, S.K., McLachlan, G.J., Lee, A.H (2006a). An incremental EM-based learning approach for on-line prediction of hospital resource utilization. Artif. Intell. Med. 36, 257–267.
Ng, S.K., McLachlan, G.J., Wang, K., Ben-Tovim Jones, L., Ng, S.W (2006b). A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22, 1745–1752.
Ng, S.K., McLachlan, G.J., Yau, K.K.W., Lee, A.H.: Modelling the distribution of ischaemic stroke-specific survival time using an EM-based mixture approach with random effects adjustment. Stat. Med. 23, 2729–2744 (2004)
Nikulin, V., McLachlan, G.J.: A gradient-based algorithm for matrix factorization applied to dimensionality reduction. In: Fred, A., Filipe, J., Gamboa, H. (eds.) Proceedings of BIOSTEC 2010, the 3rd International Joint Conference on Biomedical Engineering Systems and Technologies, pp. 147–152. Institute for Systems and Technologies of Information, Control and Communication, Portugal (2010)
Pavlidis, P., Li, Q., Noble, W.S.: The effect of replication on gene expression microarray experiments. Bioinformatics 19, 1620–1627 (2003)
Pernkopf, F., Bouchaffra, D.: Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1344–1348 (2005)
Pofahl, W.E., Walczak, S.M., Rhone, E., Izenberg, S.D.: Use of an artificial neural network to predict length of stay in acute pancreatitis. Am. Surg. 64, 868–872 (1998)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
Roberts, G.O., Polson, N.G.: On the geometric convergence of the Gibbs sampler. J. Roy. Stat. Soc. B 56, 377–384 (1994)
Sahu, S.K., Roberts, G.O.: On convergence of the EM algorithm and the Gibbs sampler. Stat. Comput. 9, 55–64 (1999)
Sato, M., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12, 407–432 (2000)
Sexton, J., Swensen, A.R.: ECM algorithms that converge at the rate of EM. Biometrika 87, 651–662 (2000)
Storey, J.D., Xiao, W., Leek, J.T., Tompkins, R.G., Davis, R.W.: Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. USA 102, 12837–12842 (2005)
Titterington, D.M.: Recursive parameter estimation using incomplete data. J. Roy. Stat. Soc. B 46, 257–267 (1984)
Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Netw. 11, 271–282 (1998)
van Dyk, D.A., Tang, R.: The one-step-late PXEM algorithm. Stat. Comput. 13, 137–152 (2003)
Vaida, F., Meng, X.L.: Two-slice EM algorithms for fitting generalized linear mixed models with binary response. Stat. Modelling 5, 229–242 (2005)
Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990)
Wright, K., Kennedy, W.J.: An interval analysis approach to the EM algorithm. J. Comput. Graph. Stat. 9, 303–318 (2000)
Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ng, S.K., Krishnan, T., McLachlan, G.J. (2012). The EM Algorithm. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21551-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-21551-3_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21550-6
Online ISBN: 978-3-642-21551-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)