Quantum Chemistry on Graphics Processing Units

doi:10.1016/S1574-1400(10)06002-0

Annual Reports in Computational Chemistry

Volume 6, 2010, Pages 21-35

https://doi.org/10.1016/S1574-1400(10)06002-0 Get rights and content

Abstract

We report on the current status of algorithm development and software implementations for acceleration of quantum chemistry and computational condensed matter physics simulations on graphics processing units (GPUs) as documented in the peer-reviewed literature. We give a general overview of programming techniques and concepts that should be considered when porting scientific software to GPUs. This is followed by a discussion of Hartree-Fock and density functional theory, wave function-based electron correlation methods and quantum Monte Carlo in which we outline the underlying problems and present the approaches which aim at exploiting the performance of the massively parallel GPU hardware. We conclude with a critical assessment of the present state of the field and discuss future directions that are likely to be taken.

Section snippets

INTRODUCTION

Commodity graphics processing units (GPUs) are becoming increasingly popular to accelerate molecular and condensed matter simulations due to their low cost and potential for high performance when compared with central processing units (CPUs). In many instances, classical approximations are very successful for such simulations. However, a large number of problems of contemporary nano-, bio-, or materials science require a quantum mechanical description of the electronic structure [1., 2., 3.].

SOFTWARE DEVELOPMENT FOR GRAPHICS PROCESSING UNITS

An excellent introduction to software development for GPUs including a discussion of the hardware and its historic development can be found in the book of Kirk and Hwu [5]. In order to be able to write software which runs efficiently on GPUs, it is necessary to have an understanding of the characteristics of the GPU hardware architecture.

A GPU is an example of a massively parallel stream-processing architecture which uses the single-instruction multiple data (SIMD) vector processing model.

KOHN–SHAM DENSITY FUNCTIONAL AND HARTREE–FOCK THEORY

Due to its excellent balance between accuracy and computational cost, Kohn–Sham density functional theory (KS-DFT) [13., 14.] is usually the method of choice to investigate electronic ground states and their properties in chemistry and solid-state physics [15., 16.]. Hartree–Fock (HF) wavefunctions, on the other hand, are the starting point for ab initio electron correlation methods [4., 15.] which are discussed in Section 4.

There are two major computational bottlenecks in KS-DFT and HF

AB INITIO ELECTRON CORRELATION METHODS

The quantum chemist’s traditional way to approximate solutions of the electronic Schrödinger equation is so-called ab initio, wave function-based electron correlation methods. These methods improve upon the HF mean-field approximation by adding many-body corrections in a systematic way [15]. As of the time of this writing, efforts to accelerate ab initio calculations with GPUs are scarce. However, it is expected that this will change in the near future because these methods are of critical

QUANTUM MONTE CARLO

Quantum Monte Carlo (QMC) [41] is one of the most accurate methods for solving the time-independent Schrödinger equation. As opposed to variational ab initio approaches, QMC is based on a stochastic evaluation of the underlying integrals. The method is easily parallelizable and scales as O(N³), however, with a very large prefactor.

Anderson et al. have shown [42] how to accelerate QMC calculations by executing CUDA kernels that are explicitly optimized for cache usage and instruction-level

CONCLUDING REMARKS

Quantum chemistry software that exploits the capabilities of modern GPUs has only recently started to emerge. Significant parts of these initial efforts have been devoted to minimize errors caused by the lack of DP support on older GPUs. The advent of next-generation GPUs that support DP arithmetics at a peak performance of only a factor of 2 less than that of SP will make these special approaches obsolete. At the same time, future developments will be greatly facilitated.

From the literature,

ACKNOWLEDGMENTS

This work was supported in part by grant 09-LR-06-117792-WALR from the University of California Lab Fees program and grant XFT-8-88509-01/DE-AC36-99GO10337 from the Department of Energy to RCW.

REFERENCES (43)

F. Jensen
L.E. McMurchie et al.
One- and two-electron integrals over Cartesian Gaussian functions
J. Comput. Phys.
(1978)
B.I. Dunlap et al.
On some approximations in applications of Xα theory
J. Chem. Phys.
(1979)
L. Genovese et al.
Density functional theory calculation on many-cores hybrid CPU-GPU architectures
J. Chem. Phys.
(2009)
F. Weigend et al.
RI-MP2: Optimized auxiliary basis sets and demonstration of efficiency
Chem. Phys. Lett.
(1998)
L. Vogt et al.
Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units
J. Phys. Chem. A
(2008)
J.S. Meredith et al.
Accuracy and performance of graphics processors: A quantum Monte Carlo application case study
Parallel Comput.
(2009)
D.C. Clary
Quantum chemistry of complex systems
Science
(2006)
E.A. Carter
Challenges in modeling materials properties without experimental input
Science
(2008)

T. Helgaker et al.

Molecular Electronic-Structure Theory

(2000)

D.B. Kirk et al.

Programming Massively Parallel Processors

(2010)

M. Frigo et al.

The design and implementation of FFTW3

Proc. IEEE

(2005)

NVIDIA Santa Clara, CA, CUDA Programming Guide...

AMD Sunnyvale, CA, ATIwww.amd.com/stream(Accessed March 14,...

NVIDIA Santa Clara, CA, CUDAhttp://www.nvidia.com/object/cuda_home.html(Accessed March 6,...

NVIDIA Santa Clara, CA, CUFFT...

NVIDIA Santa Clara, CA, CUBLAS Library...

Innovative Computing Laboratory, University of Tennessee, Matrix Algebra on GPU and Multicore...

W. Kohn et al.

Self-consistent equations including exchange and correlation effects

Phys. Rev.

(1965)

R.G. Parr et al.

Density-Functional Theory of Atoms and Molecules

(1989)

Cited by (33)

Theoretical aspects of sulfide and selenides: Structure, point defects, and electronic structure modifications
2022, Sulfide and Selenide Based Materials for Emerging Applications: Sustainable Energy Harvesting and Storage Technology
Sun light is converted into electrical energy using solar panels owing to the photovoltaic effect providing a sustainable and abundant energy source. Global solar panels market is currently dominated by crystalline Si photovoltaic technologies. However, a need to reduce cost of renewable energy, decrease carbon footprint of solar modules as well as to address a demand for lightweight and flexible photovoltaic devices stimulate research and development activities in the field of thin film solar cells. In addition, due to its indirect band gap of 1.12 eV, crystalline silicon has a low optical absorption coefficient and, hence, texturization of Si wafer is required to improve light management for this type of solar cells. In contrast, use of thin film photovoltaic materials with optimal direct band gap and high optical absorption coefficient lessens demand for raw materials and reduces thermal budget for the fabrication of solar cells. Commercial fabrication of thin film solar cells is typically based on amorphous silicon (a-Si), CdTe, and Cu(In, Ga)Se₂ (CIGS) photovoltaic materials. Despite the remarkably high power conversion efficiency of CdTe and CIGS solar cells, their application is limited on terra-watt scale owing to the scarcity of constituent elements such as In and Te. Furthermore, toxicity of Cd is also a factor hindering mass deployment of CdTe solar cells. Therefore, there is a need to develop thin film photovoltaic materials, which are composed of earth-abundant and non-toxic elements for the fabrication of potentially low cost solar cells.
Fast plane wave density functional theory molecular dynamics calculations on multi-GPU machines
2013, Journal of Computational Physics
Citation Excerpt :
The reported overall speedup for that work is rather modest: 15% decrease of the total computational time. In contrast, quantum chemistry calculations using localized basis sets [12], like GAMESS-US [13], NWChem [14], Q-CHEM [15], and TeraChem [16,17] have claimed larger than 10× speedups using GPU. The wavelet based BigDFT code also claims to have ×6 speedup [18].
Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most widely used method for material simulations, but its absolute speed stagnated due to the inability to use large scale CPU based computers. By a drastic redesign of the algorithm, and moving all the major computation parts into GPU, we have reached a speed of 12 s per molecular dynamics (MD) step for a 512 atom system using 256 GPU cards. This is about 20 times faster than the CPU version of the code regardless of the number of CPU cores used. Our tests and analysis on different GPU platforms and configurations shed lights on the optimal GPU deployments for PWP-DFT calculations. An 1800 step MD simulation is used to study the liquid phase properties of GaInP.
Molecular dynamics simulations with many-body potentials on multiple GPUs - The implementation, package and performance
2013, Computer Physics Communications
Citation Excerpt :
One such development that has been attracting increasing attention in the past few years because of its effective cost and encouraging performance is the use of a graphic processing unit (GPU) for parallel computing; a number of primary scientific applications demonstrating this development have been summarised by Garland et al. [9], Götz et al. [10], Xu et al. [11]
Molecular dynamics (MD) is an important research tool extensively applied in materials science. Running MD on a graphics processing unit (GPU) is an attractive new approach for accelerating MD simulations. Currently, GPU implementations of MD usually run in a one-host-process-one-GPU (OHPOG) scheme. This scheme may pose a limitation on the system size that an implementation can handle due to the small device memory relative to the host memory. In this paper, we present a one-host-process-multiple-GPU (OHPMG) implementation of MD with embedded-atom-model or semi-empirical tight-binding many-body potentials. Because more device memory is available in an OHPMG process, the system size that can be handled is increased to a few million or more atoms. In comparison with the serial CPU implementation, in which Newton’s third law is applied to improve the computational efficiency, our OHPMG implementation has achieved a 28.9x–86.0x speedup in double precision, depending on the system size, the cut-off ranges and the number of GPUs. The implementation can also handle a group of small simulation boxes in one run by combining the small boxes into a large box. This approach greatly improves the GPU computing efficiency when a large number of MD simulations for small boxes are needed for statistical purposes.
SPFP: Speed without compromise - A mixed precision model for GPU accelerated molecular dynamics simulations
2013, Computer Physics Communications
Citation Excerpt :
Powerful graphics processing units (GPUs) that deliver a high peak performance of both integer and floating point arithmetics are common components of desktop workstations and are becoming increasingly ubiquitous as specialized accelerator hardware in modern high-performance computing platforms [1]. The potential of GPU hardware for an economically efficient acceleration of scientific applications has long been realized [2,3] and mature implementations are available for molecular dynamics (MD) simulations of condensed phase biomolecular systems providing capabilities to researchers that can surpass traditional CPU-based implementations [4–9]. In order to achieve high performance on GPUs, however, it is mandatory to implement algorithms that are able to exploit the specific hardware architecture of GPUs.
A new precision model is proposed for the acceleration of all-atom classical molecular dynamics (MD) simulations on graphics processing units (GPUs). This precision model replaces double precision arithmetic with fixed point integer arithmetic for the accumulation of force components as compared to a previously introduced model that uses mixed single/double precision arithmetic. This significantly boosts performance on modern GPU hardware without sacrificing numerical accuracy. We present an implementation for NVIDIA GPUs of both generalized Born implicit solvent simulations as well as explicit solvent simulations using the particle mesh Ewald (PME) algorithm for long-range electrostatics using this precision model. Tests demonstrate both the performance of this implementation as well as its numerical stability for constant energy and constant temperature biomolecular MD as compared to a double precision CPU implementation and double and mixed single/double precision GPU implementations.
The analysis of a plane wave pseudopotential density functional theory code on a GPU machine
2013, Computer Physics Communications
Citation Excerpt :
One of the most widely used PWP DFT codes is VASP. An x7 speedup has been achieved for VASP on a single NVIDIA Tesla C2050 GPU for a small physical system [7]. However, no multiple GPU parallelization has been done in that work, and other preliminary works by other groups for VASP and other PWP codes on multiple GPU have so far achieved less impressive results (e.g., with speed up less than x2).
Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most widely used material science simulation, and the PWP DFT codes are arguably the most important material science codes. We have implemented a PWP DFT code PEtot on a multi-node GPU machine. Starting from a previous work, we have further improved the speed of the code, and achieved x13-x22 speedups over the CPU calculations for a typical 512 atom system. Such speedups are much higher than other similar works for this important class of material simulation codes on GPU clusters. The current achievement is obtained by (1) moving the calculation fully into the GPU; (2) adopting a new algorithm to reduce the data amount for MPI communication; and (3) using new GPU and CPU numerical libraries. We have also provided a detail quantitative analysis of the computational times for different physical systems and number of GPU units, which helps one to understand the challenges and bottlenecks of the PWP DFT simulations on GPU machines. Based on the analysis, we listed the machine and library requirements in order to further improve the performances of the PWP DFT calculations.
Fragment molecular orbital method adaptations for heterogeneous computing platforms
2012, Procedia Computer Science
Modern electronic structure calculations are characterized by unprecedented complexity and accuracy. They de-mand the full power of high-performance computing and must be in tune with the given architecture for superior efficiency. Thus, it is desirable to enable their static and dynamic adaptations using some external software (middle-ware), which may monitor both system availability and application needs, rather than mix science with system-related calls inside the application.Building on the successful usage of the NICAN middleware with the computational chemistry package GAMESS, the work described in this paper links NICAN with the fragment molecular orbital (FMO) method to augment FMO with adaptive capabilities. Specifically, its fragment scheduling is performed, both statically and dynamically, based on current conditions within a heterogeneous computing environment. Significant execution time and throughput gains have been obtained with static adaptations, while the dynamic ones prevented FMO to abort calculations due to the insuffcient memory available at the runtime.

View all citing articles on Scopus

View full text

Chapter 2 - Quantum Chemistry on Graphics Processing Units

Abstract

Section snippets

INTRODUCTION

SOFTWARE DEVELOPMENT FOR GRAPHICS PROCESSING UNITS

KOHN–SHAM DENSITY FUNCTIONAL AND HARTREE–FOCK THEORY

AB INITIO ELECTRON CORRELATION METHODS

QUANTUM MONTE CARLO

CONCLUDING REMARKS

ACKNOWLEDGMENTS

J. Comput. Phys.

J. Chem. Phys.

J. Chem. Phys.

Chem. Phys. Lett.

J. Phys. Chem. A

Parallel Comput.

Quantum chemistry of complex systems

Science

Challenges in modeling materials properties without experimental input