ABSTRACT

Despite a few notable uses of simulation of random processes in the pre-computer era (Hammersley and Handscomb, 1964, Section 1.2; Stigler, 2002, Chapter 7), practical widespread use of simulation had to await the invention of computers. Almost as soon as computers were invented, they were used for simulation (Hammersley and Handscomb, 1964, Section 1.2). The name “Monte Carlo” started as cuteness-gambling was then (around 1950) illegal in most places, and the casino at Monte Carlo was the most famous in the world-but it soon became a colorless technical term for simulation of random processes. Markov chain Monte Carlo (MCMC) was invented soon after ordinary Monte Carlo at

Los Alamos, one of the few places where computers were available at the time. Metropolis et al. (1953)∗ simulated a liquid in equilibrium with its gas phase. The obvious way to find out about the thermodynamic equilibrium is to simulate the dynamics of the system, and let it run until it reaches equilibrium. The tour de force was their realization that they did not need to simulate the exact dynamics; they only needed to simulate someMarkov chain having the same equilibrium distribution. Simulations following the scheme of Metropolis et al. (1953) are said to use the Metropolis algorithm. As computers became more widely available, the Metropolis algorithm was widely used by chemists and physicists, but it did not become widely known among statisticians until after 1990. Hastings (1970) generalized the Metropolis algorithm, and simulations following his scheme are said to use the Metropolis-Hastings algorithm. A special case of the Metropolis-Hastings algorithm was introduced by Geman and Geman (1984), apparently without knowledge of earlier work. Simulations following their scheme are said to use the Gibbs sampler. Much of Geman and Geman (1984) discusses optimization to find the posterior mode rather than simulation, and it took some time for it to be understood in the spatial statistics community that the Gibbs sampler simulated the posterior distribution, thus enabling full Bayesian inference of all kinds. Amethodology that was later seen to be very similar to the Gibbs sampler was introduced by Tanner and Wong (1987), again apparently without knowledge of earlier work. To this day, some refer to the Gibbs sampler as “data augmentation” following these authors. Gelfand and Smith (1990)made thewider Bayesian community aware of theGibbs sampler, which up to that time had been known only in the spatial statistics community. Then it took off; as of this writing, a search for Gelfand and Smith (1990) on Google Scholar yields 4003 links to other works. It was rapidly realized that most Bayesian inference could

researchers to properly understand the theory of MCMC (Geyer, 1992; Tierney, 1994) and that all of the aforementionedworkwas a special case of the notion ofMCMC. Green (1995) generalized theMetropolis-Hastings algorithm, asmuch as it can be generalized.Although this terminology is not widely used, we say that simulations following his scheme use the Metropolis-Hastings-Green algorithm. MCMC is not used only for Bayesian inference. Likelihood inference in caseswhere the likelihood cannot be calculated explicitly due tomissing data or complex dependence can also useMCMC (Geyer, 1994, 1999; Geyer and Thompson, 1992, 1995, and references cited therein).