GPU accelerated MCMC for modeling terrorist activity

https://doi.org/10.1016/j.csda.2013.03.027Get rights and content

Abstract

The use of graphical processing unit (GPU) parallel processing is becoming a part of mainstream statistical practice. The reliance of Bayesian statistics on Markov Chain Monte Carlo (MCMC) methods makes the applicability of parallel processing not immediately obvious. It is illustrated that there are substantial gains in improved computational time for MCMC and other methods of evaluation by computing the likelihood using GPU parallel processing. Examples use data from the Global Terrorism Database to model terrorist activity in Colombia from 2000 through 2010 and a likelihood based on the explicit convolution of two negative-binomial processes. Results show decreases in computational time by a factor of over 200. Factors influencing these improvements and guidelines for programming parallel implementations of the likelihood are discussed.

Introduction

Parallel processing in statistics is used with increasing frequency. The confluence of an increasing abundance of large datasets for analyses, the desire to implement more sophisticated models, and the advent of powerful, inexpensive graphical processing units (GPUs) and more accessible programming tools, creates an opportunity for making significant gains in effective computing capacity and developing new computational methods. In Bayesian statistics, the serial nature of Markov Chain Monte Carlo (MCMC) appears to limit the applicability of parallel processing methods, placing limits on the dimension and complexity of models fit using MCMC. Non-MCMC methods for computation such as Approximate Bayesian Computation (Plagnol and Taveré, 2004), direct sampling (Walker et al., 2010), or the use of integrated nested Laplace approximations (Rue et al., 2009) show great promise, but are not as universally applicable as MCMC methods. It is likely that MCMC will remain a standard tool for Bayesian statistics for the foreseeable future.

The benefit of GPU parallel processing for accelerating MCMC is demonstrated with a general approach. This acceleration is accomplished using the GPU to compute the likelihood as a part of generating samples from the target posterior. While almost all methods to generate samples from a posterior using MCMC require the evaluation of the product of the likelihood and prior, the focus here is on the likelihood, which is typically much more computationally intense than the prior. This approach, focusing on the likelihood, is most useful for large datasets, and instances where the computational burden for the likelihood is high.

Historically, parallel processing initially required expensive specialized hardware, limiting its availability for statistical modeling. Later, as “Beowulf” clusters based on networks of more affordable desktop computers became more common, parallel processing outside the confines of specialized research facilities became more widely available to smaller research groups. The cost of purchasing and housing the required hardware, and the need for considerable technical knowledge, limits their availability to individual researchers. Their application to Bayesian models is limited by the excessive time it can take to pass data between processors, limiting their use mostly to the parallel computation of multiple MCMC chains (Rosenthal, 2000), or to using parallel processing for “pre-fetching” to calculate potential acceptance rates (Brockwell, 2006, Strid, 2010).

The increasing availability of GPU units and their application to scientific computations puts the capabilities for massive scale parallel programming within easy reach of the individual researcher. For a Bayesian or other statistician, GPU parallel processing has many advantages over CPU parallel processing. A multi-core CPU may be able to execute up to 12 parallel operations, while a GPU will be capable of thousands. This can greatly reduce the time needed to perform large computational operations. The GPU advantage over Beowulf or similar clusters is in the rapid data transfer and sharing between processors on the GPU and the main CPU. This capacity makes it feasible to see significant processing time reduction by using the GPU’s parallel processing capabilities. Examples of these advantages are evident in recent papers on a variety of applications including: versions of multivariate slice sampler (Tibbits et al., 2010), large genetic algorithms (Manavski and Valle, 2008), stochastic chemical models (Niemi and Wheeler, 2012), and phylogenetic models (Silberstein et al., 2008). In a more general application, the computational capabilities of the GPU show great potential for application within each iteration of a MCMC sampling scheme, to accelerate the scheme (see Suchard et al. (2010), for an excellent discussion on this).

Calculating the likelihood, or more often the log-likelihood, is a ubiquitous procedure in data analysis and parameter estimation. The computations for sampling from a target posterior using MCMC typically require the evaluation of the product of the likelihood and the prior, or more commonly, the sum of the log-likelihood and the log-prior (Neal, 2003, Gilks and Wild, 1992, Gilks et al., 1995, Metropolis et al., 1953). It is evident that for n observations and p parameters to sample, if np, then the likelihood is the most computationally intense part of the computation. In cases where the log-likelihood is the sum of independent individual terms for each observation, the individual terms can be evaluated in parallel, and then summed. This makes the likelihood computation an excellent candidate for evaluation using the GPU for both MCMC and large-scale maximum likelihood estimation problems.

The motivating example is the Bayesian implementation of a novel self-exciting model for terrorist activity. A self-exciting point process is one where the realization of events increases the short-term probability of observing future events, much in the same manner that one contagious individual can infect other individuals (while he or she are still infectious), or how major earthquakes lead to aftershocks. These models have a stochastic mean equal to the sum of a baseline and a self-exciting term. The self-exciting term is determined by the previous event history. There are several examples of using the Hawkes self-exciting model (Hawkes, 1971) to account for the clustering patterns present in terrorist event data. Holden, 1986, Holden, 1987 assumes the daily number of US aircraft hijackings arises from a Poisson distribution with mean equal to a constant plus a self-exciting term. Lewis et al. (2011) models the number of civilian deaths in Iraq assuming a Poisson distribution, but allows the mean to vary according to a non-constant background process in addition to self-excitation. Kaplan et al. (2006) use a similar excitation term or “discounted rate” to model the rate of suicide bombings in Israel. Negative binomial distributions (White et al., 2012) and self-exciting hurdle models (Porter and White, 2012) are employed to account for days with a comparatively large number of terrorist incidents. All of these examples estimate parameters via maximum likelihood. Very recently, Rasmussen (in press) illustrated a Bayesian approach to estimating a Hawkes process using a Metropolis-within-Gibbs algorithm, applied to a very small set of simulated data.

Inspired by the clustering representation of the Hawkes process (Hawkes and Oakes, 1974), a new class of self-exciting models for terrorist activity is proposed by assuming that the daily number of terrorist incidents results from the convolution of two processes. In this flexible model, each process has its own distribution, allowing a non-constant baseline for the resulting model. Parameter estimation is in a Bayesian framework using MCMC methods. While the self-exciting convolution models add distributional flexibility, the nature of the convolution process and Bayesian inference introduces computational challenges. An example using data on terrorist attacks in the country of Colombia illustrates how these challenges can be overcome using GPU calculations.

Section  2 contains a brief discussion of the data, and presents the model; Section  3 describes the GPU implementation of the model, discussing factors influencing the reduction in computational time, and some guidelines for programming; Section  4 presents results and a comparison between CPU performance and the GPU, and the model’s results; Section  5 presents some general suggestions for computations on the GPU, and a discussion of further areas for research.

Section snippets

Data from the Global Terrorism Database

A standard Hawkes model (Hawkes, 1971) is a continuous time Poisson point process with stochastic intensity λ(t)=μ+αi:ti<tg(tti) where μ is a constant baseline parameter, α0 is the magnitude parameter, g(u)0 is the decay function, and {ti} are the event times. The magnitude parameter α is usually restricted to be less than 1 so the process is not explosive (Daley and Vere-Jones, 2003). The decay function is restricted to be a probability density function on the positive real line (i.e., g(u)

The model for terrorist activity

Despite the existence of alternatives (OpenCL, OpenGL), NVIDIA’s CUDA programming language is typically the preferred language of the non-expert user writing code for execution on a GPU. CUDA is a proprietary language for NVIDIA GPUs derived from the C language. There is an active online community, and numerous examples and training material available for the novice user. What follows is a brief description of GPU parallel processing, followed by a presentation of the algorithms in this

Results

The model is implemented in R version 2.15.1 (R Development Core Team, 2012) in Ubuntu 10.10 running on an Intel i7 985X CPU and an NVIDIA Tesla C2050 GPU. Results for benchmark tests are computed and used to extrapolate approximate MCMC times for the serial version of the code, based on execution time for the MCMC code using GPU parallel processing.

Conclusion

The results in Section  4 illuminate both the utility of using the GPU to calculate the likelihood or log-likelihood as part of an MCMC scheme, and the insight into terrorist activity that is provided by the convolution model. This approach can help provide answers in a more timely fashion and allow the implementation of models previously considered impractical. The model, algorithms, and CUDA code in this paper may not be optimal. There are likely more efficient ways of implementing this model

References (35)

  • I. Strid

    Efficient parallelisation of Metropolis Hastings algorithms using a prefetching approach

    Computational Statistics and Data Analysis

    (2010)
  • A.E. Brockwell

    Parallel Markov chain Monte Carlo simulation by pre-fetching

    Journal of Computational and Graphical Statistics

    (2006)
  • D.J. Daley et al.

    An Introduction to the Theory of Point Processes, Vol. I

    (2003)
  • R.P. Feynman et al.

    “Surely You’re Joking, Mr. Feynman”: The Adventures of a Curious Character

    (1985)
  • W. Gilks et al.

    Adaptive rejection Metropolis sampling within Gibbs sampling

    Journal of the Royal Statistical Society, Series C Applied Statistics

    (1995)
  • W. Gilks et al.

    Adaptive rejection sampling for Gibbs sampling

    Applied Statistics

    (1992)
  • T. Hastie et al.

    The Elements of Staistical Learning: Data Mining, Inference and Prediction

    (2001)
  • A.G. Hawkes

    Spectra of some self-exciting and mutually exciting point processes

    Biometrika

    (1971)
  • A.G. Hawkes et al.

    A cluster process representation of a self-exciting process

    Journal of Applied Probability

    (1974)
  • R. Holden

    The contagiousness of aircraft hijacking

    American Journal of Sociology

    (1986)
  • R. Holden

    Time series analysis of a contagious process

    Journal of the American Statistical Association

    (1987)
  • E.H. Kaplan et al.

    Tactical prevention of suicide bomings in Israel

    Interfaces

    (2006)
  • Kempenaar, M., Dijkstra, M., 2010. CUDA enabled processing in R with R/GPU....
  • G. LaFree et al.

    Introducing the global terrorism database

    Terrorism and Political Violence

    (2007)
  • A. Lee et al.

    On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

    Journal of Computational and Graphical Statistics

    (2010)
  • E. Lewis et al.

    Self-exciting point process models of civilian deaths in Iraq

    Security Journal

    (2011)
  • Y. Liu et al.

    GPU accelerated Smith–Waterman

  • Cited by (14)

    • Powered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition

      2017, Computational Statistics and Data Analysis
      Citation Excerpt :

      In this era of big data, the use of MCMC is quite limited due to the computation and storage cost. In recent years, the widespread use of multi-core CPUs and GPUs, parallel computing has become a powerful tool for dealing with such issues (Mahani and Sharabiani, 2015; White and Porter, 2014). MCMC is a Markovian process, in which, by definition, the state at one time depends on the state at the previous time; it is, therefore, difficult to design a parallel algorithm within a Markov chain.

    • Accelerating overlapping community detection: Performance tuning a stochastic gradient Markov chain Monte Carlo algorithm

      2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Partial self-exciting point processes and their parameter estimations

      2019, Communications in Statistics: Simulation and Computation
    View all citing articles on Scopus
    View full text