Technical Note
Low cost, high performance GPU computing solution for atomic resolution cryoEM single-particle reconstruction

https://doi.org/10.1016/j.jsb.2010.05.006Get rights and content

Abstract

Recent advancements in cryo-electron microscopy (cryoEM) have made it technically possible to determine the three-dimensional (3D) structures of macromolecular complexes at atomic resolution. However, processing the large amount of data needed for atomic resolution reconstructions requires either accessing to very expensive computer clusters or waiting for weeks of continuous computation in a personal computer (PC). In this paper, we present a practical computational solution to this 3D reconstruction problem through the optimal utilization of the processing capabilities of both commodity graphics hardware (i.e., general purpose graphics processing unit (GPGPU)). Our solution, which is implemented in a new program, called eLite3D, has a number of advanced features of general interests. First, we construct interleaved schemes to prevent the data race condition intrinsic in merging of 2D data into a 3D volume. Second, we introduce a processing pipeline strategy to optimally balance I/O and computation operations, thus improving CPU and GPGPU parallelism. The speedup of eLite3D is up to 100 times over other commonly used 3D reconstruction programs with the same accuracy, thus allowing completion of atomic resolution 3D reconstructions of large complexes in a PC in 1–2 h other than days or weeks. Our result provides a practical solution to atomic resolution cryoEM (asymmetric or symmetric) reconstruction and offers useful guidelines for developing GPGPU applications in general.

Introduction

Cryo-electron microscopy (cryoEM) is a fast emerging tool in structural biology for three-dimensional (3D) structure determination of macromolecular complexes. Several recent hardware improvements in cryoEM instrumentation together have made it possible to obtain cryoEM images containing atomic resolution information. From such images, it is possible to determine 3D structures at near atomic resolution (∼4 Å) (Jiang et al., 2008, Ludtke et al., 2008, Yu et al., 2008a, Zhang et al., 2008a, Zhang et al., 2010a, Cong et al., 2010), and more recently at atomic resolution (Zhang et al., 2010a, Zhang et al., 2010b).

However, due to the intrinsic poor signal/noise ratio in cryoEM images, the number of images required for high-resolution studies increases exponentially as the targeted resolution improves; hence the computation time for 3D reconstruction also increases exponentially, creating a bottleneck for routine applications of cryoEM. Computer-controlled cryoEM instrument and automation in data collection have enabled the acquisition of this large number of particle images in a few days (Stagg et al., 2006). However, processing this huge amount of image data, especially reconstructing a large size density map from the particle images, takes a long period of time (from days, weeks or even months, depending on the targeted resolution and particle sizes) and has become the de facto time-limiting step in high-resolution cryoEM reconstruction. In addition, the clock speed of computer processors has remained roughly unchanged for the past several years and may remain so for the near future due to an upper limit of transistor switching time. Taken together, there is an urgent need for high performance computation solution of the 3D reconstruction problem.

The process of obtaining a 3D structure from 2D cryoEM images consists of two main tasks: orientation determination/refinement and 3D reconstruction (DeRosier and Klug, 1968, Crowther et al., 1970c, Crowther, 1971b). The orientation determination/refinement task can be accomplished on individual particle images thus is ‘embarrassingly’ parallel in nature. Various applications and software kits have been developed to handle this task efficiently through distributed computing (e.g., Smith et al., 1991, Johnson et al., 1994, Martino et al., 1994, Baker and Cheng, 1996, Crowther et al., 1996, Frank et al., 1996a, van Heel et al., 1996, Zhou et al., 1998, Ludtke et al., 1999a, Liang et al., 2002, Sorzano et al., 2004, Grigorieff, 2007, Heymann and Belnap, 2007, Tang et al., 2007, Yan et al., 2007, Yang et al., 2007). The latter task, 3D reconstruction, requires combining many particle images to form a single 3D volume (supplementary Fig. 1). This task, however, has data dependency (see below) and has not been optimized for multi-core and many-core computing. In fact, it has become the computational bottleneck of cryoEM data processing in atomic resolution structure determination of large complexes. For example, a single iteration of 3D reconstruction of a middle size virus particle (∼700 Å in diameter) to 3–4 Å resolution takes 1–2 weeks of computation to complete. Because several iterations are required for each structure, such approach quickly becomes unrealistic when pushing the reconstruction resolution further to the 2–3 Å range.

Addition of advanced capabilities, such as random memory access and in-order execution, to commodity graphics processing unit (GPU) has led to the development of general purpose GPU (GPGPU) in recent years. This development makes it possible to use stream-processing on non-graphical data, thus providing a very cost-effective solution to computation-intensive problems. Inherited from the superior characteristics of GPU, GPGPU offers high floating point calculation power by devoting more transistors to data processing. Moreover, the dedicated memory of GPGPU alleviates the limitation due to the so called von Neumann bottleneck (i.e., competition for memory access by processing units sharing the same system bus) (Backus, 1978). These characteristics make GPGPU an attractive choice for large-scale, data-parallel processing tasks with high arithmetic intensity and high spatial locality. However, GPGPU has a limited cache and only implements simple flow control (NVIDIA, 2009b), in contrast to CPU, which has a large cache and implements sophisticate flow control in-order to minimize latency for arbitrary system memory (sMEM) access for serial process. In addition, severe race conditions (Netzer and Miller, 1992) exist in the operation of 3D Fourier interpolation in the data merging step. Other important factors, such as thread mapping, graphics memory (gMEM) management and coalescing access etc., should also be carefully considered to fully exploit the massive computation power of GPGPU.

In this paper, we present a practical solution to the 3D reconstruction problem using GPGPU and its implementation as an integrated program, eLite3D. This solution drastically reduces the computation time needed to compute an atomic resolution reconstruction of large complexes to only a small fraction (1–5%) of that needed by other commonly used reconstruction programs, permitting completion of weeklong reconstruction tasks within 1–2 h in a personal computer (PC). Our solution represents a practical and cost effective approach to atomic resolution cryoEM reconstruction and offers general guidelines for GPGPU implementation of other computation and data intensive problems.

Section snippets

Single particle 3D reconstruction

Fourier space reconstruction method (Crowther et al., 1970a, Crowther et al., 1970b, Crowther, 1971a) is currently the standard algorithm for single particle 3D reconstruction. The most notable advantage of this method is speed at the algorithmic level comparing to other methods, such as the weighted back-projection method (Radermacher, 1988) and its variants. When merging 2D Fourier transforms of images in 3D Fourier space, it is necessary to properly interpolate and weigh the 2D Fourier data

Data dependency and solutions

One property of the Fourier transform data used in 3D reconstruction is that they have data dependency. Data dependency is a situation in which a program statement (instruction) refers to the data of a preceding statement. For every grid point in the 3D Fourier space, we sum many Fourier data points from 2D Fourier transforms of particle images (see above). A simple sequential summation of these data points will lead to a data dependency situation – a latter summation operation depends on the

Accuracy and performance evaluations of eLite3D

To improve CPU and GPGPU parallelism, we introduce a processing pipeline strategy to optimally balance I/O and computation operations (see Supplementary result and supplementary Fig. 2).

For execution of the GPU-based programs, a desktop and a workstation as listed in Table 1 are assembled. Here we used the desktop to represent a low-end computer and the workstation to represent a high-end computer.

We first test our program using projection images computed from known atomic structures and then

Conclusion

We have solved the major time-limiting computational problem in atomic resolution cryoEM reconstruction by developing eLite3D. Our solution makes PCs equipped with GPGPU as competitive as expensive computer clusters for high-resolution 3D reconstructions of large complexes. Our interleaved schemes for eliminating data dependency described in this study are generally applicable to developing high-performance GPGPU solutions for other computation-intensive, data-rich problems.

Software availability

The software package is freely available from our website at http://www.eicn.ucla.edu/imirs.

Acknowledgments

This research is supported in part by grants from the National Institutes of Health (GM071940 and AI069015 to Z.H.Z.). We thank Jiansen Jiang, Peng Ge, Hongrong Liu, Xuekui Yu, Wong H. Hui, and Lei Jin for suggestions.

References (38)

  • S.M. Stagg et al.

    Automated cryoEM data acquisition and analysis of 284742 particles of GroEL

    J. Struct. Biol.

    (2006)
  • G. Tang et al.

    EMAN2: an extensible image processing suite for electron microscopy

    J. Struct. Biol.

    (2007)
  • M. van Heel et al.

    A new generation of the IMAGIC image processing system

    J. Struct. Biol.

    (1996)
  • X. Yan et al.

    AUTO3DEM – an automated and high throughput program for image reconstruction of icosahedral particles

    J. Struct. Biol.

    (2007)
  • C. Yang et al.

    The parallelization of SPIDER on distributed-memory computers using MPI

    J. Struct. Biol.

    (2007)
  • X. Zhang et al.

    3.3 Å cryoEM structure of a nonenveloped virus reveals a priming mechanism for cell entry

    Cell

    (2010)
  • Z.H. Zhou et al.

    Refinement of herpesvirus B-capsid structure on parallel supercomputers

    Biophys. J.

    (1998)
  • J. Backus

    Can programming be liberated from von Neumann style–Functional style and its algebra of programs

    Commun. Acm.

    (1978)
  • Cheng, L., Zhu, J., Hui, W.H., Zhang, X., Honig, B., Fang, Q., Zhou, Z.H., 2010. Backbone model of an aquareovirus...
  • Cited by (0)

    View full text