doi:10.1016/j.cpc.2005.01.005
Copyright © 2005 Elsevier B.V. All rights reserved.
Embedded divide-and-conquer algorithm on hierarchical real-space grids: parallel molecular dynamics simulation based on linear-scaling density functional theory
Fuyuki Shimojoa, b, Rajiv K. Kaliaa, Aiichiro Nakanoa,
,
and Priya Vashishtaa
aCollaboratory for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Materials Science & Engineering, University of Southern California, Los Angeles, CA 90089-0242, USA
bDepartment of Physics, Kumamoto University, Kumamoto 860-8555, Japan
Received 5 October 2004;
revised 24 January 2005;
accepted 26 January 2005.
Available online 14 March 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
A linear-scaling algorithm has been developed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory. A divide-and-conquer algorithm is used to compute the electronic structure, where non-additive contribution to the kinetic energy is included with an embedded cluster scheme. Electronic wave functions are represented on a real-space grid, which is augmented with coarse multigrids to accelerate the convergence of iterative solutions and adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid algorithm on massively parallel computers. A converged solution to the electronic-structure problem is obtained for a 32,768-atom amorphous CdSe system on 512 IBM POWER4 processors. The total energy is well conserved during MD simulations of liquid Rb, showing the applicability of this algorithm to first principles MD simulations. The parallel efficiency is 0.985 on 128 Intel Xeon processors for a 65,536-atom CdSe system.
Keywords: Parallel computing; Molecular dynamics; Density functional theory; Linear scaling algorithm
PACS: 02.70.-c; 02.70.Ns; 71.15.-m
Fig. 1. Schematic of the divide-and-conquer algorithm in 2D. The physical space Ω is a union of overlapping domains, Ω=
αΩα. Each domain is further decomposed into the non-overlapping core, Ω0α, the primary buffer layer, Γ1α (see the shaded area), and the secondary buffer layer, Γ2α (see the hatched area). The depth of the primary and total (= primary + secondary) buffer layers are d1 and d, respectively.
Fig. 2. Schematic of hierarchical real-space grids in 2D. Coarse multigrids (gray) are used to accelerate iterative solutions of the DFT problem on the original real-space grid (corresponding to the grid refinement level, l=3). The bottom panel shows that fine grids are adaptively generated near the atoms (spheres) to accurately operate the ionic pseudopotentials on the wave functions.
Fig. 3. Schematic of the parallel divide-and-conquer algorithm in 2D. The physical system is divided into subsystems, P0,…,P3, of equal volume, and each subsystem is assigned to a processor in a parallel computer. Each subsystem, in turn, consists of multiple non-overlapping core domains, Ω00,…,Ω08. To perform electronic-structure calculations on overlapping domains, Ω0,…,Ω8, on processor P0, the contributions to the total electron density,
, within the primary buffer-layer depth d1 (see the hatched area), need to be cached from the nearest-neighbor processors. In addition, the ionic positions within depth d+rc (enclosed by dashed lines) need to be cached from the nearest-neighbor processors to compute ionic pseudopotentials. Here, d and rc are the total buffer-layer depth and the range of nonlocal pseudopotentials, respectively.
Fig. 4. Potential energy per atom as a function of the domain size for an amorphous CdSe system (512 atoms in a cubic cell of side length, 45.664 a.u.). The buffer size is fixed as 2.854 a.u. The atomic units are used for both energy and length. Numerals in the figure indicate the number of self-consistent iterations required for the convergence of the electron density within
, where
is the electron density at ith iteration, ρ0 is the average electron density, 0.0215 a.u., and the brackets denote the average over the grid in the entire system.
Fig. 5. Potential energy as a function of the buffer length, d, for an amorphous CdSe system (512 atoms in a cubic cell of side length, 45.664 a.u.). The domain size is fixed as 11.416 a.u. The atomic units are used for both energy and length. Numerals in the figure indicate the number of self-consistent iterations required for the convergence of the electron density within
, where
is the electron density at ith iteration, ρ0 is the average electron density, 0.0215 a.u., and the brackets denote the average over the grid for the entire system.
Fig. 6. The potential energy as a function of the number of self-consistent iterations for an amorphous CdSe system (32,768 atoms in a cubic cell of side length, 182.656 a.u.), where Emin denotes the converged energy (or the potential energy obtained at the 50th iteration). The domain and buffer sizes are 11.416 and 5.708 a.u., respectively. The atomic units are used for both energy and length. The calculation was performed on 512 IBM POWER4 processors.
Fig. 7. (Left) Convergence of the electron density in the self-consistent iterations for an amorphous CdSe system (32,768 atoms in a cubic cell of side length, 182.656 a.u.). ρi(r) is the electron density at ith iteration, ρ0 is the average electron density, 0.0215 a.u., and the brackets denote the average over the grid in the entire system. The domain and buffer sizes are 11.416 and 5.708 a.u., respectively. (Right) Convergence of the electron density in the self-consistent iterations for an amorphous CdSe system (512 atoms in a cubic cell of side length, 45.664 a.u.). The domain and buffer sizes are the same as in the left panel.
Fig. 8. Total (solid curve) and potential (dashed curve) energies in the atomic unit as a function of time in a molecular-dynamics simulation of liquid Rb at 1400 K (432 atoms in a cubic cell of side length, 65.696 a.u.). The domain and buffer sizes are 16.424 and 8.212 a.u., respectively.
Fig. 9. Wall-clock (circles) and communication (squares) times per self-consistent iteration of the parallel EDC-DFT algorithm, with scaled workloads—512 P atom CdSe system on P processors (P=1,…,128) on an Intel Xeon-based Linux cluster.