GPU accelerated MCMC for modeling terrorist activity
Introduction
Parallel processing in statistics is used with increasing frequency. The confluence of an increasing abundance of large datasets for analyses, the desire to implement more sophisticated models, and the advent of powerful, inexpensive graphical processing units (GPUs) and more accessible programming tools, creates an opportunity for making significant gains in effective computing capacity and developing new computational methods. In Bayesian statistics, the serial nature of Markov Chain Monte Carlo (MCMC) appears to limit the applicability of parallel processing methods, placing limits on the dimension and complexity of models fit using MCMC. Non-MCMC methods for computation such as Approximate Bayesian Computation (Plagnol and Taveré, 2004), direct sampling (Walker et al., 2010), or the use of integrated nested Laplace approximations (Rue et al., 2009) show great promise, but are not as universally applicable as MCMC methods. It is likely that MCMC will remain a standard tool for Bayesian statistics for the foreseeable future.
The benefit of GPU parallel processing for accelerating MCMC is demonstrated with a general approach. This acceleration is accomplished using the GPU to compute the likelihood as a part of generating samples from the target posterior. While almost all methods to generate samples from a posterior using MCMC require the evaluation of the product of the likelihood and prior, the focus here is on the likelihood, which is typically much more computationally intense than the prior. This approach, focusing on the likelihood, is most useful for large datasets, and instances where the computational burden for the likelihood is high.
Historically, parallel processing initially required expensive specialized hardware, limiting its availability for statistical modeling. Later, as “Beowulf” clusters based on networks of more affordable desktop computers became more common, parallel processing outside the confines of specialized research facilities became more widely available to smaller research groups. The cost of purchasing and housing the required hardware, and the need for considerable technical knowledge, limits their availability to individual researchers. Their application to Bayesian models is limited by the excessive time it can take to pass data between processors, limiting their use mostly to the parallel computation of multiple MCMC chains (Rosenthal, 2000), or to using parallel processing for “pre-fetching” to calculate potential acceptance rates (Brockwell, 2006, Strid, 2010).
The increasing availability of GPU units and their application to scientific computations puts the capabilities for massive scale parallel programming within easy reach of the individual researcher. For a Bayesian or other statistician, GPU parallel processing has many advantages over CPU parallel processing. A multi-core CPU may be able to execute up to 12 parallel operations, while a GPU will be capable of thousands. This can greatly reduce the time needed to perform large computational operations. The GPU advantage over Beowulf or similar clusters is in the rapid data transfer and sharing between processors on the GPU and the main CPU. This capacity makes it feasible to see significant processing time reduction by using the GPU’s parallel processing capabilities. Examples of these advantages are evident in recent papers on a variety of applications including: versions of multivariate slice sampler (Tibbits et al., 2010), large genetic algorithms (Manavski and Valle, 2008), stochastic chemical models (Niemi and Wheeler, 2012), and phylogenetic models (Silberstein et al., 2008). In a more general application, the computational capabilities of the GPU show great potential for application within each iteration of a MCMC sampling scheme, to accelerate the scheme (see Suchard et al. (2010), for an excellent discussion on this).
Calculating the likelihood, or more often the log-likelihood, is a ubiquitous procedure in data analysis and parameter estimation. The computations for sampling from a target posterior using MCMC typically require the evaluation of the product of the likelihood and the prior, or more commonly, the sum of the log-likelihood and the log-prior (Neal, 2003, Gilks and Wild, 1992, Gilks et al., 1995, Metropolis et al., 1953). It is evident that for observations and parameters to sample, if , then the likelihood is the most computationally intense part of the computation. In cases where the log-likelihood is the sum of independent individual terms for each observation, the individual terms can be evaluated in parallel, and then summed. This makes the likelihood computation an excellent candidate for evaluation using the GPU for both MCMC and large-scale maximum likelihood estimation problems.
The motivating example is the Bayesian implementation of a novel self-exciting model for terrorist activity. A self-exciting point process is one where the realization of events increases the short-term probability of observing future events, much in the same manner that one contagious individual can infect other individuals (while he or she are still infectious), or how major earthquakes lead to aftershocks. These models have a stochastic mean equal to the sum of a baseline and a self-exciting term. The self-exciting term is determined by the previous event history. There are several examples of using the Hawkes self-exciting model (Hawkes, 1971) to account for the clustering patterns present in terrorist event data. Holden, 1986, Holden, 1987 assumes the daily number of US aircraft hijackings arises from a Poisson distribution with mean equal to a constant plus a self-exciting term. Lewis et al. (2011) models the number of civilian deaths in Iraq assuming a Poisson distribution, but allows the mean to vary according to a non-constant background process in addition to self-excitation. Kaplan et al. (2006) use a similar excitation term or “discounted rate” to model the rate of suicide bombings in Israel. Negative binomial distributions (White et al., 2012) and self-exciting hurdle models (Porter and White, 2012) are employed to account for days with a comparatively large number of terrorist incidents. All of these examples estimate parameters via maximum likelihood. Very recently, Rasmussen (in press) illustrated a Bayesian approach to estimating a Hawkes process using a Metropolis-within-Gibbs algorithm, applied to a very small set of simulated data.
Inspired by the clustering representation of the Hawkes process (Hawkes and Oakes, 1974), a new class of self-exciting models for terrorist activity is proposed by assuming that the daily number of terrorist incidents results from the convolution of two processes. In this flexible model, each process has its own distribution, allowing a non-constant baseline for the resulting model. Parameter estimation is in a Bayesian framework using MCMC methods. While the self-exciting convolution models add distributional flexibility, the nature of the convolution process and Bayesian inference introduces computational challenges. An example using data on terrorist attacks in the country of Colombia illustrates how these challenges can be overcome using GPU calculations.
Section 2 contains a brief discussion of the data, and presents the model; Section 3 describes the GPU implementation of the model, discussing factors influencing the reduction in computational time, and some guidelines for programming; Section 4 presents results and a comparison between CPU performance and the GPU, and the model’s results; Section 5 presents some general suggestions for computations on the GPU, and a discussion of further areas for research.
Section snippets
Data from the Global Terrorism Database
A standard Hawkes model (Hawkes, 1971) is a continuous time Poisson point process with stochastic intensity where is a constant baseline parameter, is the magnitude parameter, is the decay function, and are the event times. The magnitude parameter is usually restricted to be less than 1 so the process is not explosive (Daley and Vere-Jones, 2003). The decay function is restricted to be a probability density function on the positive real line (i.e.,
The model for terrorist activity
Despite the existence of alternatives (OpenCL, OpenGL), NVIDIA’s CUDA programming language is typically the preferred language of the non-expert user writing code for execution on a GPU. CUDA is a proprietary language for NVIDIA GPUs derived from the C language. There is an active online community, and numerous examples and training material available for the novice user. What follows is a brief description of GPU parallel processing, followed by a presentation of the algorithms in this
Results
The model is implemented in R version 2.15.1 (R Development Core Team, 2012) in Ubuntu 10.10 running on an Intel i7 985X CPU and an NVIDIA Tesla C2050 GPU. Results for benchmark tests are computed and used to extrapolate approximate MCMC times for the serial version of the code, based on execution time for the MCMC code using GPU parallel processing.
Conclusion
The results in Section 4 illuminate both the utility of using the GPU to calculate the likelihood or log-likelihood as part of an MCMC scheme, and the insight into terrorist activity that is provided by the convolution model. This approach can help provide answers in a more timely fashion and allow the implementation of models previously considered impractical. The model, algorithms, and CUDA code in this paper may not be optimal. There are likely more efficient ways of implementing this model
References (35)
Efficient parallelisation of Metropolis Hastings algorithms using a prefetching approach
Computational Statistics and Data Analysis
(2010)Parallel Markov chain Monte Carlo simulation by pre-fetching
Journal of Computational and Graphical Statistics
(2006)- et al.
An Introduction to the Theory of Point Processes, Vol. I
(2003) - et al.
“Surely You’re Joking, Mr. Feynman”: The Adventures of a Curious Character
(1985) - et al.
Adaptive rejection Metropolis sampling within Gibbs sampling
Journal of the Royal Statistical Society, Series C Applied Statistics
(1995) - et al.
Adaptive rejection sampling for Gibbs sampling
Applied Statistics
(1992) - et al.
The Elements of Staistical Learning: Data Mining, Inference and Prediction
(2001) Spectra of some self-exciting and mutually exciting point processes
Biometrika
(1971)- et al.
A cluster process representation of a self-exciting process
Journal of Applied Probability
(1974) The contagiousness of aircraft hijacking
American Journal of Sociology
(1986)
Time series analysis of a contagious process
Journal of the American Statistical Association
Tactical prevention of suicide bomings in Israel
Interfaces
Introducing the global terrorism database
Terrorism and Political Violence
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods
Journal of Computational and Graphical Statistics
Self-exciting point process models of civilian deaths in Iraq
Security Journal
GPU accelerated Smith–Waterman
Cited by (14)
Faster inference from state space models via GPU computing
2024, Ecological InformaticsPowered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition
2017, Computational Statistics and Data AnalysisCitation Excerpt :In this era of big data, the use of MCMC is quite limited due to the computation and storage cost. In recent years, the widespread use of multi-core CPUs and GPUs, parallel computing has become a powerful tool for dealing with such issues (Mahani and Sharabiani, 2015; White and Porter, 2014). MCMC is a Markovian process, in which, by definition, the state at one time depends on the state at the previous time; it is, therefore, difficult to design a parallel algorithm within a Markov chain.
Scalable Bayesian inference for self-excitatory stochastic processes applied to big American gunfire data
2021, Statistics and ComputingAccelerating overlapping community detection: Performance tuning a stochastic gradient Markov chain Monte Carlo algorithm
2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Partial self-exciting point processes and their parameter estimations
2019, Communications in Statistics: Simulation and Computation