Chapter 2 - Quantum Chemistry on Graphics Processing Units

https://doi.org/10.1016/S1574-1400(10)06002-0Get rights and content

Abstract

We report on the current status of algorithm development and software implementations for acceleration of quantum chemistry and computational condensed matter physics simulations on graphics processing units (GPUs) as documented in the peer-reviewed literature. We give a general overview of programming techniques and concepts that should be considered when porting scientific software to GPUs. This is followed by a discussion of Hartree-Fock and density functional theory, wave function-based electron correlation methods and quantum Monte Carlo in which we outline the underlying problems and present the approaches which aim at exploiting the performance of the massively parallel GPU hardware. We conclude with a critical assessment of the present state of the field and discuss future directions that are likely to be taken.

Section snippets

INTRODUCTION

Commodity graphics processing units (GPUs) are becoming increasingly popular to accelerate molecular and condensed matter simulations due to their low cost and potential for high performance when compared with central processing units (CPUs). In many instances, classical approximations are very successful for such simulations. However, a large number of problems of contemporary nano-, bio-, or materials science require a quantum mechanical description of the electronic structure [1., 2., 3.].

SOFTWARE DEVELOPMENT FOR GRAPHICS PROCESSING UNITS

An excellent introduction to software development for GPUs including a discussion of the hardware and its historic development can be found in the book of Kirk and Hwu [5]. In order to be able to write software which runs efficiently on GPUs, it is necessary to have an understanding of the characteristics of the GPU hardware architecture.

A GPU is an example of a massively parallel stream-processing architecture which uses the single-instruction multiple data (SIMD) vector processing model.

KOHN–SHAM DENSITY FUNCTIONAL AND HARTREE–FOCK THEORY

Due to its excellent balance between accuracy and computational cost, Kohn–Sham density functional theory (KS-DFT) [13., 14.] is usually the method of choice to investigate electronic ground states and their properties in chemistry and solid-state physics [15., 16.]. Hartree–Fock (HF) wavefunctions, on the other hand, are the starting point for ab initio electron correlation methods [4., 15.] which are discussed in Section 4.

There are two major computational bottlenecks in KS-DFT and HF

AB INITIO ELECTRON CORRELATION METHODS

The quantum chemist’s traditional way to approximate solutions of the electronic Schrödinger equation is so-called ab initio, wave function-based electron correlation methods. These methods improve upon the HF mean-field approximation by adding many-body corrections in a systematic way [15]. As of the time of this writing, efforts to accelerate ab initio calculations with GPUs are scarce. However, it is expected that this will change in the near future because these methods are of critical

QUANTUM MONTE CARLO

Quantum Monte Carlo (QMC) [41] is one of the most accurate methods for solving the time-independent Schrödinger equation. As opposed to variational ab initio approaches, QMC is based on a stochastic evaluation of the underlying integrals. The method is easily parallelizable and scales as O(N3), however, with a very large prefactor.

Anderson et al. have shown [42] how to accelerate QMC calculations by executing CUDA kernels that are explicitly optimized for cache usage and instruction-level

CONCLUDING REMARKS

Quantum chemistry software that exploits the capabilities of modern GPUs has only recently started to emerge. Significant parts of these initial efforts have been devoted to minimize errors caused by the lack of DP support on older GPUs. The advent of next-generation GPUs that support DP arithmetics at a peak performance of only a factor of 2 less than that of SP will make these special approaches obsolete. At the same time, future developments will be greatly facilitated.

From the literature,

ACKNOWLEDGMENTS

This work was supported in part by grant 09-LR-06-117792-WALR from the University of California Lab Fees program and grant XFT-8-88509-01/DE-AC36-99GO10337 from the Department of Energy to RCW.

REFERENCES (43)

  • T. Helgaker et al.

    Molecular Electronic-Structure Theory

    (2000)
  • D.B. Kirk et al.

    Programming Massively Parallel Processors

    (2010)
  • M. Frigo et al.

    The design and implementation of FFTW3

    Proc. IEEE

    (2005)
  • NVIDIA Santa Clara, CA, CUDA Programming Guide...
  • AMD Sunnyvale, CA, ATIwww.amd.com/stream(Accessed March 14,...
  • NVIDIA Santa Clara, CA, CUDAhttp://www.nvidia.com/object/cuda_home.html(Accessed March 6,...
  • NVIDIA Santa Clara, CA, CUFFT...
  • NVIDIA Santa Clara, CA, CUBLAS Library...
  • Innovative Computing Laboratory, University of Tennessee, Matrix Algebra on GPU and Multicore...
  • W. Kohn et al.

    Self-consistent equations including exchange and correlation effects

    Phys. Rev.

    (1965)
  • R.G. Parr et al.

    Density-Functional Theory of Atoms and Molecules

    (1989)
  • Cited by (33)

    • Theoretical aspects of sulfide and selenides: Structure, point defects, and electronic structure modifications

      2022, Sulfide and Selenide Based Materials for Emerging Applications: Sustainable Energy Harvesting and Storage Technology
    • Fast plane wave density functional theory molecular dynamics calculations on multi-GPU machines

      2013, Journal of Computational Physics
      Citation Excerpt :

      The reported overall speedup for that work is rather modest: 15% decrease of the total computational time. In contrast, quantum chemistry calculations using localized basis sets [12], like GAMESS-US [13], NWChem [14], Q-CHEM [15], and TeraChem [16,17] have claimed larger than 10× speedups using GPU. The wavelet based BigDFT code also claims to have ×6 speedup [18].

    • Molecular dynamics simulations with many-body potentials on multiple GPUs - The implementation, package and performance

      2013, Computer Physics Communications
      Citation Excerpt :

      One such development that has been attracting increasing attention in the past few years because of its effective cost and encouraging performance is the use of a graphic processing unit (GPU) for parallel computing; a number of primary scientific applications demonstrating this development have been summarised by Garland et al. [9], Götz et al. [10], Xu et al. [11]

    • SPFP: Speed without compromise - A mixed precision model for GPU accelerated molecular dynamics simulations

      2013, Computer Physics Communications
      Citation Excerpt :

      Powerful graphics processing units (GPUs) that deliver a high peak performance of both integer and floating point arithmetics are common components of desktop workstations and are becoming increasingly ubiquitous as specialized accelerator hardware in modern high-performance computing platforms [1]. The potential of GPU hardware for an economically efficient acceleration of scientific applications has long been realized [2,3] and mature implementations are available for molecular dynamics (MD) simulations of condensed phase biomolecular systems providing capabilities to researchers that can surpass traditional CPU-based implementations [4–9]. In order to achieve high performance on GPUs, however, it is mandatory to implement algorithms that are able to exploit the specific hardware architecture of GPUs.

    • The analysis of a plane wave pseudopotential density functional theory code on a GPU machine

      2013, Computer Physics Communications
      Citation Excerpt :

      One of the most widely used PWP DFT codes is VASP. An x7 speedup has been achieved for VASP on a single NVIDIA Tesla C2050 GPU for a small physical system [7]. However, no multiple GPU parallelization has been done in that work, and other preliminary works by other groups for VASP and other PWP codes on multiple GPU have so far achieved less impressive results (e.g., with speed up less than x2).

    View all citing articles on Scopus
    View full text