Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units

doi:10.1371/journal.pone.0046693

Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units

Figure 1

CUDA Computing Model.

The basic execution unit is a thread. Threads are grouped into thread blocks. Each thread block is executed on a single multi-processor. Threads in a thread block can communicate through shared memory which is essentially a user-controlled cache. In the latest GPUs, shared memory can be configured to act as an L1 cache. Register space can be used to store data local to threads with the fastest access speed. Spill-over data that does not fit in registers goes to local memory, which is physically stored in the main memory and which is very slow. Main memory has three components. Global memory is accessible to all threads and is cached through an L2 cache that is shared among all multi-processors on the latest generation GPUs. Constant memory is typically used to store data that is used by all threads (simulation constants) and that is automatically cached. Texture memory is read-only memory with automatic cache.

doi: https://doi.org/10.1371/journal.pone.0046693.g001