Technical NoteLow cost, high performance GPU computing solution for atomic resolution cryoEM single-particle reconstruction
Introduction
Cryo-electron microscopy (cryoEM) is a fast emerging tool in structural biology for three-dimensional (3D) structure determination of macromolecular complexes. Several recent hardware improvements in cryoEM instrumentation together have made it possible to obtain cryoEM images containing atomic resolution information. From such images, it is possible to determine 3D structures at near atomic resolution (∼4 Å) (Jiang et al., 2008, Ludtke et al., 2008, Yu et al., 2008a, Zhang et al., 2008a, Zhang et al., 2010a, Cong et al., 2010), and more recently at atomic resolution (Zhang et al., 2010a, Zhang et al., 2010b).
However, due to the intrinsic poor signal/noise ratio in cryoEM images, the number of images required for high-resolution studies increases exponentially as the targeted resolution improves; hence the computation time for 3D reconstruction also increases exponentially, creating a bottleneck for routine applications of cryoEM. Computer-controlled cryoEM instrument and automation in data collection have enabled the acquisition of this large number of particle images in a few days (Stagg et al., 2006). However, processing this huge amount of image data, especially reconstructing a large size density map from the particle images, takes a long period of time (from days, weeks or even months, depending on the targeted resolution and particle sizes) and has become the de facto time-limiting step in high-resolution cryoEM reconstruction. In addition, the clock speed of computer processors has remained roughly unchanged for the past several years and may remain so for the near future due to an upper limit of transistor switching time. Taken together, there is an urgent need for high performance computation solution of the 3D reconstruction problem.
The process of obtaining a 3D structure from 2D cryoEM images consists of two main tasks: orientation determination/refinement and 3D reconstruction (DeRosier and Klug, 1968, Crowther et al., 1970c, Crowther, 1971b). The orientation determination/refinement task can be accomplished on individual particle images thus is ‘embarrassingly’ parallel in nature. Various applications and software kits have been developed to handle this task efficiently through distributed computing (e.g., Smith et al., 1991, Johnson et al., 1994, Martino et al., 1994, Baker and Cheng, 1996, Crowther et al., 1996, Frank et al., 1996a, van Heel et al., 1996, Zhou et al., 1998, Ludtke et al., 1999a, Liang et al., 2002, Sorzano et al., 2004, Grigorieff, 2007, Heymann and Belnap, 2007, Tang et al., 2007, Yan et al., 2007, Yang et al., 2007). The latter task, 3D reconstruction, requires combining many particle images to form a single 3D volume (supplementary Fig. 1). This task, however, has data dependency (see below) and has not been optimized for multi-core and many-core computing. In fact, it has become the computational bottleneck of cryoEM data processing in atomic resolution structure determination of large complexes. For example, a single iteration of 3D reconstruction of a middle size virus particle (∼700 Å in diameter) to 3–4 Å resolution takes 1–2 weeks of computation to complete. Because several iterations are required for each structure, such approach quickly becomes unrealistic when pushing the reconstruction resolution further to the 2–3 Å range.
Addition of advanced capabilities, such as random memory access and in-order execution, to commodity graphics processing unit (GPU) has led to the development of general purpose GPU (GPGPU) in recent years. This development makes it possible to use stream-processing on non-graphical data, thus providing a very cost-effective solution to computation-intensive problems. Inherited from the superior characteristics of GPU, GPGPU offers high floating point calculation power by devoting more transistors to data processing. Moreover, the dedicated memory of GPGPU alleviates the limitation due to the so called von Neumann bottleneck (i.e., competition for memory access by processing units sharing the same system bus) (Backus, 1978). These characteristics make GPGPU an attractive choice for large-scale, data-parallel processing tasks with high arithmetic intensity and high spatial locality. However, GPGPU has a limited cache and only implements simple flow control (NVIDIA, 2009b), in contrast to CPU, which has a large cache and implements sophisticate flow control in-order to minimize latency for arbitrary system memory (sMEM) access for serial process. In addition, severe race conditions (Netzer and Miller, 1992) exist in the operation of 3D Fourier interpolation in the data merging step. Other important factors, such as thread mapping, graphics memory (gMEM) management and coalescing access etc., should also be carefully considered to fully exploit the massive computation power of GPGPU.
In this paper, we present a practical solution to the 3D reconstruction problem using GPGPU and its implementation as an integrated program, eLite3D. This solution drastically reduces the computation time needed to compute an atomic resolution reconstruction of large complexes to only a small fraction (1–5%) of that needed by other commonly used reconstruction programs, permitting completion of weeklong reconstruction tasks within 1–2 h in a personal computer (PC). Our solution represents a practical and cost effective approach to atomic resolution cryoEM reconstruction and offers general guidelines for GPGPU implementation of other computation and data intensive problems.
Section snippets
Single particle 3D reconstruction
Fourier space reconstruction method (Crowther et al., 1970a, Crowther et al., 1970b, Crowther, 1971a) is currently the standard algorithm for single particle 3D reconstruction. The most notable advantage of this method is speed at the algorithmic level comparing to other methods, such as the weighted back-projection method (Radermacher, 1988) and its variants. When merging 2D Fourier transforms of images in 3D Fourier space, it is necessary to properly interpolate and weigh the 2D Fourier data
Data dependency and solutions
One property of the Fourier transform data used in 3D reconstruction is that they have data dependency. Data dependency is a situation in which a program statement (instruction) refers to the data of a preceding statement. For every grid point in the 3D Fourier space, we sum many Fourier data points from 2D Fourier transforms of particle images (see above). A simple sequential summation of these data points will lead to a data dependency situation – a latter summation operation depends on the
Accuracy and performance evaluations of eLite3D
To improve CPU and GPGPU parallelism, we introduce a processing pipeline strategy to optimally balance I/O and computation operations (see Supplementary result and supplementary Fig. 2).
For execution of the GPU-based programs, a desktop and a workstation as listed in Table 1 are assembled. Here we used the desktop to represent a low-end computer and the workstation to represent a high-end computer.
We first test our program using projection images computed from known atomic structures and then
Conclusion
We have solved the major time-limiting computational problem in atomic resolution cryoEM reconstruction by developing eLite3D. Our solution makes PCs equipped with GPGPU as competitive as expensive computer clusters for high-resolution 3D reconstructions of large complexes. Our interleaved schemes for eliminating data dependency described in this study are generally applicable to developing high-performance GPGPU solutions for other computation-intensive, data-rich problems.
Software availability
The software package is freely available from our website at http://www.eicn.ucla.edu/imirs.
Acknowledgments
This research is supported in part by grants from the National Institutes of Health (GM071940 and AI069015 to Z.H.Z.). We thank Jiansen Jiang, Peng Ge, Hongrong Liu, Xuekui Yu, Wong H. Hui, and Lei Jin for suggestions.
References (38)
- et al.
A model-based approach for determining orientations of biological macromolecules imaged by cryoelectron microscopy
J. Struct. Biol.
(1996) - et al.
SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields
J. Struct. Biol.
(1996) FREALIGN: high-resolution refinement of single particle structures
J. Struct. Biol.
(2007)- et al.
Bsoft: image processing and molecular modeling for electron microscopy
J. Struct. Biol.
(2007) - et al.
IMIRS: a high-resolution 3D reconstruction package integrated with a relational image database
J. Struct. Biol.
(2002) - et al.
Symmetry-adapted spherical harmonics method for high-resolution 3D single-particle reconstructions
J. Struct. Biol.
(2008) - et al.
De novo backbone trace of GroEL from single particle electron cryomicroscopy
Structure
(2008) - et al.
EMAN: semi-automated software for high resolution single particle reconstructions
J. Struct. Biol.
(1999) - et al.
EMAN: semiautomated software for high-resolution single-particle reconstructions
J. Struct. Biol.
(1999) - et al.
XMIPP: a new generation of an open-source image processing package for electron microscopy
J. Struct. Biol.
(2004)