A multi-point Metropolis scheme with generic weight functions

doi:10.1016/j.spl.2012.04.008

Statistics & Probability Letters

Volume 82, Issue 7, July 2012, Pages 1445-1453

https://doi.org/10.1016/j.spl.2012.04.008 Get rights and content

Abstract

The multi-point Metropolis algorithm is an advanced MCMC technique based on drawing several correlated samples at each step and choosing one of them according to some normalized weights. We propose a variation of this technique where the weight functions are not specified, i.e., the analytic form can be chosen arbitrarily. This has the advantage of greater flexibility in the design of high-performance MCMC samplers. We prove that our method fulfills the balance condition, and provide a numerical simulation. We also give new insight into the functionality of different MCMC algorithms, and the connections between them.

Introduction

Monte Carlo statistical methods are powerful tools for numerical inference and stochastic optimization (see Robert and Casella (2004), for instance). Markov chain Monte Carlo (MCMC) methods are classical Monte Carlo techniques that generate samples from a target probability density function (pdf) by drawing from a simpler proposal pdf, usually to approximate an otherwise-incalculable (analytically) integral (Liu, 2004, Liang et al., 2010). MCMC algorithms produce a Markov chain with a stationary distribution that coincides with the target pdf.

The Metropolis–Hastings (MH) algorithm (Metropolis et al., 1953, Hastings, 1970) is the most famous MCMC technique. It can be applied to almost any target distribution. In practice, however, finding a “good” proposal pdf can be difficult. In some applications, the Markov chain generated by the MH algorithm can remain trapped almost indefinitely in a local mode meaning that, in practice, convergence may not be reached.

The Multiple-Try Metropolis (MTM) method of Liu et al. (2000) is an extension of the MH algorithm in which the next state of the chain is selected among a set of independent and identically distributed (i.i.d.) samples. This enables the MCMC sampler to make large step-size jumps without a lowering the acceptance rate; and thus MTM can explore a larger portion of the sample space in fewer iterations.

An interesting special case of the MTM, well-known in molecular simulation field, is the orientational bias Monte Carlo, as described in Chapter 13 of Frenkel and Smit (1996) and Chapter 5 of Liu (2004), where i.i.d. candidates are drawn from a symmetric proposal pdf, and one of these is chosen according to some weights directly proportional to the target pdf. Here, however, the analytic form of the weight functions is fixed and unalterable.

Casarin et al. (in press) introduced a MTM scheme using different proposal pdfs. In this case the samples produced are independent but not identically distributed. In Qin and Liu (2001), another generalization of the MTM (called the multi-point Metropolis method) is proposed using correlated candidates at each step. Clearly, the proposal pdfs are also different in this case.

Moreover, in Pandolfi et al. (2010) an extension of the classical MTM technique is introduced where the analytic form of the weights is not specified. In Pandolfi et al. (2010), the same proposal pdf is used to draw samples, so that the candidates generated in each step of the algorithm are i.i.d. Further interesting and related considerations about the use of auxiliary variables for building acceptance probabilities within a MH approach can be found in Storvik (2011).

In this paper, we draw from the two approaches (Qin and Liu, 2001, Pandolfi et al., 2010) to create a novel algorithm that selects a new state of the chain among correlated samples using generic weight functions, i.e., the analytic form of the weights can be chosen arbitrarily. Furthermore, we formulate the algorithm and the acceptance rule in order to fulfill the detailed balance condition.

Our method allows more flexibility in the design of efficient MCMC samplers with a larger coverage and faster exploration of the sample space. In fact, we can choose any bounded and positive weight functions to either improve performance or reduce computational complexity, independently of the chosen proposal pdf. Moreover, since in our approach the proposal pdfs are different, adaptive or interacting techniques can be applied, such as those introduced by Andrieu and Moulines (2006) and Casarin et al. (in press). An important advantage of our procedure is that, since in our procedure a new candidate is drawn from a conditional pdf which depends on the samples generated earlier during the same time step, it constructs an improved proposal by automatically building on the information obtained from the generated samples.

The rest of the paper is organized as follows. In Section 2 we recall the standard multi-point Metropolis algorithm. In Section 3 we introduce our novel scheme with generic weight functions and correlated samples. Section 4 provides a rigorous proof that the novel scheme satisfies the detailed balance condition. A numerical simulation is provided in Section 5 and finally, in Section 6, we discuss the advantages of our proposed technique and provide an insight into the relationships among different MTM schemes in literature.

Section snippets

Multi-point Metropolis algorithm

In the classical MH algorithm, a new possible state is drawn from the proposal pdf and the movement is accepted with a suitable decision rule. In the multi-point approach, several correlated samples are generated and, from these, a “good” one is chosen.

Specifically, consider a target pdf $p_{o} (x)$ known up to a constant (hence, we can evaluate $p (x) \propto p_{o} (x)$ ). Given a current state $x \in R$ (we assume scalar values only for simplicity in the treatment), we draw $N$ correlated samples each step from a sequence

Extension with generic weight functions

Now, we consider generic weight functions $ω_{j} (z_{1}, \dots, z_{j + 1}) \in R^{j + 1} \to R^{+}$ , that have to be (a) bounded and (b) positive. In this case, the algorithm can be described as follows.

1.
Draw $N$ samples $y_{1 : N} = [y_{1}, y_{2}, \dots, y_{N}]$ from the joint pdf $q_{N} (y_{1 : N} | x) = π_{1} (y_{1} | x) \prod_{j = 2}^{N} π_{j} (y_{j} | x, y_{1 : j - 1})$ namely, draw $y_{j}$ from $π_{j} (\cdot | x, y_{1 : j - 1})$ , with $j = 1, \dots, N$ .
2.
Choose some suitable (bounded and positive) weight functions. Then, calculate each weight $ω_{j} (y_{j : 1}, x)$ , and normalize them to obtain ${\bar{ω}}_{j}$ , $j = 1, \dots, N$ .
3.
Draw a $y = y_{k} \in {y_{1}, \dots, y_{N}}$ according to ${\bar{ω}}_{1}, \dots, {\bar{ω}}_{N}$ ,

Proof of the detailed balance condition

To guarantee that a Markov chain generated by an MCMC method converges to the target distribution $p (x) \propto p_{o} (x)$ , the kernel $A (y | x)$ of the corresponding algorithm fulfills the following detailed balance condition¹ $p (x) A (y | x) = p (y) A (x | y) .$ First of all, we have to

Toy example

Now we provide a simple numerical simulation to show an example of multi-point scheme with generic weight functions and compare it with the technique in Pandolfi et al. (2010). Let $X \in R$ be a random variable² with bimodal pdf $p_{o} (x) \propto p (x) = exp {- {(x^{2} - 4)}^{2} / 4} .$ Our goal is to draw samples from $p_{o} (x)$ using our proposed multi-point technique.

Discussion

In this work, we have introduced a Metropolis scheme with multiple correlated points where the weight functions are not defined specifically, i.e., the analytic form can be chosen arbitrarily. We proved that our novel scheme satisfies the detailed balance condition.

Our approach draws from two different approaches (Pandolfi et al., 2010, Qin and Liu, 2001) to form a novel efficient and flexible multi-point scheme.

The multi-point approach with correlated samples provides different advantages over

Acknowledgments

We would like to thank the Reviewer for his comments which have helped us to improve the first version of manuscript. Moreover, this work has been partially supported by Ministerio de Ciencia e Innovación of Spain (project MONIN, ref. TEC-2006-13514-C02- 01/TCM, Program Consolider-Ingenio 2010, ref. CSD2008- 00010 COMONSENS, and Distribuited Learning Communication and Information Processing (DEIPRO) ref. TEC2009-14504-C02-01) and Comunidad Autonoma de Madrid (project PROMULTIDIS-CM, ref.

References (12)

C. Andrieu et al.
On the ergodicity properties of some adaptive MCMC algorithms
The Annals of Applied Probability
(2006)
Casarin, R., Craiu, R., Leisen, F., 2011. Interacting multiple try algorithms with different proposal distributions....
D. Frenkel et al.
Understanding Molecular Simulation: From Algorithms to Applications
(1996)
W.K. Hastings
Monte Carlo sampling methods using Markov chains and their applications
Biometrika
(1970)
Liang, F., Liu, C., Caroll, R., 2010. Advanced Markov Chain Monte Carlo Methods: Learning From Past Samples. In: Wiley...
J.S. Liu
Monte Carlo Strategies in Scientific Computing
(2004)

There are more references available in the full text version of this article.

Cited by (29)

Compressed Monte Carlo with application in particle filtering
2021, Information Sciences
Citation Excerpt :
Monte Carlo (MC) techniques come to the rescue for solving the most difficult problems of inference [27,44]. They are benchmark tools for approximating complicated integrals involving sophisticated multidimensional target densities, based ondrawing of random samples [44,34]. Markov Chain Monte Carlo (MCMC) algorithms, Importance Sampling (IS) schemes, and its sequential version (particle filtering) are the most important classes of MC methods [45].
Bayesian models have become very popular over the last years in several fields such as signal processing, statistics, and machine learning. Bayesian inference requires the approximation of complicated integrals involving posterior distributions. For this purpose, Monte Carlo (MC) methods, such as Markov Chain Monte Carlo and importance sampling algorithms, are often employed. In this work, we introduce the theory and practice of a Compressed MC (C-MC) scheme to compress the statistical information contained in a set of random samples. In its basic version, C-MC is strictly related to the stratification technique, a well-known method used for variance reduction purposes. Deterministic C-MC schemes are also presented, which provide very good performance. The compression problem is strictly related to the moment matching approach applied in different filtering techniques, usually called as Gaussian quadrature rules or sigma-point methods. C-MC can be employed in a distributed Bayesian inference framework when cheap and fast communications with a central processor are required. Furthermore, C-MC is useful within particle filtering and adaptive IS algorithms, as shown by three novel schemes introduced in this work. Six numerical results confirm the benefits of the introduced schemes, outperforming the corresponding benchmark methods. A related code is also provided. (The code is provided at http://www.lucamartino.altervista.org/CMC_CODE_pub_EX1.zip)
Group Importance Sampling for particle filtering and MCMC
2018, Digital Signal Processing: A Review Journal
Bayesian methods and their implementations by means of sophisticated Monte Carlo techniques have become very popular in signal processing over the last years. Importance Sampling (IS) is a well-known Monte Carlo technique that approximates integrals involving a posterior distribution by means of weighted samples. In this work, we study the assignation of a single weighted sample which compresses the information contained in a population of weighted samples. Part of the theory that we present as Group Importance Sampling (GIS) has been employed implicitly in different works in the literature. The provided analysis yields several theoretical and practical consequences. For instance, we discuss the application of GIS into the Sequential Importance Resampling framework and show that Independent Multiple Try Metropolis schemes can be interpreted as a standard Metropolis–Hastings algorithm, following the GIS approach. We also introduce two novel Markov Chain Monte Carlo (MCMC) techniques based on GIS. The first one, named Group Metropolis Sampling method, produces a Markov chain of sets of weighted samples. All these sets are then employed for obtaining a unique global estimator. The second one is the Distributed Particle Metropolis–Hastings technique, where different parallel particle filters are jointly used to drive an MCMC algorithm. Different resampled trajectories are compared and then tested with a proper acceptance probability. The novel schemes are tested in different numerical experiments such as learning the hyperparameters of Gaussian Processes, two localization problems in a wireless sensor network (with synthetic and real data) and the tracking of vegetation parameters given satellite observations, where they are compared with several benchmark Monte Carlo techniques. Three illustrative Matlab demos are also provided.
A review of multiple try MCMC algorithms for signal processing
2018, Digital Signal Processing: A Review Journal
Citation Excerpt :
Later on a more general algorithm, called Multiple Try Metropolis (MTM), was introduced [12].1 The MTM algorithm has been extensively studied and generalized in different ways [13–17]. Other techniques, alternative to the MTM schemes, are the so-called the Ensemble MCMC (EnMCMC) methods [18–21].
Many applications in signal processing require the estimation of some parameters of interest given a set of observed data. More specifically, Bayesian inference needs the computation of a-posteriori estimators which are often expressed as complicated multi-dimensional integrals. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and Monte Carlo methods are the only feasible approach. A very powerful class of Monte Carlo techniques is formed by the Markov Chain Monte Carlo (MCMC) algorithms. They generate a Markov chain such that its stationary distribution coincides with the target posterior density. In this work, we perform a thorough review of MCMC methods using multiple candidates in order to select the next state of the chain, at each iteration. With respect to the classical Metropolis–Hastings method, the use of multiple try techniques foster the exploration of the sample space. We present different Multiple Try Metropolis schemes, Ensemble MCMC methods, Particle Metropolis–Hastings algorithms and the Delayed Rejection Metropolis technique. We highlight limitations, benefits, connections and differences among the different methods, and compare them by numerical simulations.
Orthogonal parallel MCMC methods for sampling and optimization
2016, Digital Signal Processing: A Review Journal
Citation Excerpt :
Another important contribution of the work is the computational improvement provided by novel parallel implementations of MCMC techniques using multiple candidates at each iteration. We present two novel schemes for parallel Multiple Try Metropolis (MTM) chains [10,25–29] (and similarly to [12]) in order to reduce the overall computational cost in the same fashion of [11], saving generated samples, target evaluations and multinomial sampling steps. One of them is an extended version, using several candidates, of the Block Independent Metropolis presented in [11].
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of “vertical” parallel MCMC chains share information using some “horizontal” MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel Multiple Try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel Simulated Annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters.
Improving the acceptance in Monte Carlo simulations: Sampling through intermediate states
2015, Journal of Computational Physics
Citation Excerpt :
In MTM (as well as in the orientational-bias MC), the next configuration of the Markov chain is picked out of a number of independent samples, according to statistical weights which are carefully chosen in order to produce the correct equilibrium distribution. The MTM method was further extended [34] and modified in several ways, ranging from adaptive strategies [35–37], to the proposal of correlated states [38,39] (also the Waste-Recycling MC method could be viewed as a multiple-proposal MC strategy with correlated states, since the trial states are not generated independently from the same configuration [27]), to the Cool Walking MC method [40]. We believe that the best way to proceed with the description of ISMC is to start with an algorithm which does not satisfy DB, and then to show how it can be modified to obtain the ISMC algorithm.
In this paper we discuss a variation of the multiple-proposal approach to Markov chain-based Monte Carlo methods. We show that by means of intermediate states, used as pivots between the departure configuration and a number of possible arrival configurations, detailed balance is maintained and the acceptance of trial moves is significantly improved with respect to the standard Metropolis Monte Carlo approach. When applied to the case of a particle in a one-dimensional rough potential, this method results in an improved accuracy in the sampling of the distribution function. For all densities of a three-dimensional bulk system of Lennard-Jones particles, it requires less Monte Carlo steps to reach equilibrium and it allows for using wider displacement steps than Metropolis Monte Carlo. In particular, for low-intermediate densities the maximum displacement step can be set to exceptionally large values. We show how biases can be introduced to further reduce the number of operations necessary to construct the trial states while still fulfilling the detailed balance requirement. Although this method is more computationally demanding than (standard) single-proposal Monte Carlo, it samples the configuration space faster. Moreover, its multiple-proposal nature makes it amenable to parallel implementation, while requiring about half the number of operations required by multiple-try Monte Carlo.
A generalized multiple-try version of the Reversible Jump algorithm
2014, Computational Statistics and Data Analysis
Citation Excerpt :
In particular, interesting extensions of the MTM approach are related to different proposal trials (Casarin et al., 2013) or correlated candidates (Qin and Liu, 2001; Craiu and Lemieux, 2007; Martino et al., 2012), which can be selected on the basis of a generic weighting function.
The Reversible Jump algorithm is one of the most widely used Markov chain Monte Carlo algorithms for Bayesian estimation and model selection. A generalized multiple-try version of this algorithm is proposed. The algorithm is based on drawing several proposals at each step and randomly choosing one of them on the basis of weights (selection probabilities) that may be arbitrarily chosen. Among the possible choices, a method is employed which is based on selection probabilities depending on a quadratic approximation of the posterior distribution. Moreover, the implementation of the proposed algorithm for challenging model selection problems, in which the quadratic approximation is not feasible, is considered. The resulting algorithm leads to a gain in efficiency with respect to the Reversible Jump algorithm, and also in terms of computational effort. The performance of this approach is illustrated for real examples involving a logistic regression model and a latent class model.

View all citing articles on Scopus

View full text

A multi-point Metropolis scheme with generic weight functions

Abstract

Introduction

Section snippets

Multi-point Metropolis algorithm

Extension with generic weight functions

Proof of the detailed balance condition

Toy example

Discussion

Acknowledgments

On the ergodicity properties of some adaptive MCMC algorithms

The Annals of Applied Probability

Understanding Molecular Simulation: From Algorithms to Applications

Monte Carlo sampling methods using Markov chains and their applications

Biometrika

Monte Carlo Strategies in Scientific Computing