Parallel preconditioning of a sparse eigensolver

doi:10.1016/S0167-8191(01)00077-1

Parallel Computing

Volume 27, Issue 7, June 2001, Pages 963-976

https://doi.org/10.1016/S0167-8191(01)00077-1 Get rights and content

Abstract

We exploit an optimization method, called deflation-accelerated conjugate gradient (DACG), which sequentially computes the smallest eigenpairs of a symmetric, positive definite, generalized eigenproblem, by conjugate gradient (CG) minimizations of the Rayleigh quotient over deflated subspaces. We analyze the effectiveness of the AINV and FSAI approximate inverse preconditioners, to accelerate DACG for the solution of finite element and finite difference eigenproblems. Deflation is accomplished via CGS and MGS orthogonalization strategies whose accuracy and efficiency are tested. Numerical tests on a Cray T3E Supercomputer were performed, showing the high degree of parallelism attainable by the code. We found that for our DACG algorithm, AINV and FSAI are both effective preconditioners. They are more efficient than Block–Jacobi.

Introduction

An important task in many scientific applications is the computation of a small number of the leftmost eigenpairs (the smallest eigenvalues and corresponding eigenvectors) of the problem $A x =λB x$ , where A and B are large, sparse, symmetric positive definite matrices. Several techniques for solving this problem have been proposed: subspace iteration [1], [15], Lanczos method [7], [11], [14], and, more recently, restarted Arnoldi–Lanczos algorithm [12], Jacobi–Davidson method [17], and optimization methods by conjugate gradient (CG) schemes [3], [9], [16].

In this paper we analyze the performance of two preconditioning techniques, when applied to an optimization method, called deflation-accelerated conjugate gradient (DACG) [8]. DACG sequentially computes a number of eigenpairs by CG minimizations of the Rayleigh quotient over subspaces of decreasing size. When effectively preconditioned, we found [4] that the efficiency of DACG well compares with that of established packages, like ARPACK [13]. In a recent work [5], the performance of DACG has also been numerically compared with that of Jacobi–Davidson method, showing that their efficiency is comparable, when a small number of eigenpairs is to be computed.

We exploit three preconditioners, Block–Jacobi, FSAI [10] and AINV [2], the latter ones falling into the class of approximate inverse preconditioners. FSAI and AINV explicitly compute an approximation, say M to A⁻¹, based on the sparse factorization of A⁻¹. Preconditioning by a product of triangular factors performs better than other techniques, mainly because the fill-in of the preconditioner is reduced. Unlike many other approximate inverse techniques, AINV and FSAI preserve the positive definiteness of the problem, which is essential in our application. The FSAI algorithm requires an a priori sparsity pattern of the approximate factor, which is not easy to provide in unstructured sparse problems. We generated the FSAI preconditioner using the same pattern as the matrix A. On the other hand, AINV is based upon a drop tolerance, ε, which in principle is more convenient for our unstructured problems. The influence of the drop tolerance has been tested. Deflation is accomplished via B-orthogonalization of the search directions. We analyzed both classical (CGS) and modified (MGS) Gram–Schmidt, and tested the accuracy and efficiency of both strategies.

We have exploited the parallel Block–Jacobi-DACG, AINV–DACG and FSAI–DACG algorithms in the solution of finite element, mixed finite element, and finite difference eigenproblems, both in two and three dimensions. A parallel implementation of the DACG algorithm has been coded¹ via a data-parallel approach, allowing preconditioning by any given approximate inverse. Ad-hoc data-distribution techniques allow for reducing the amount of communication among the processors, which could spoil the parallel performance of the ensuing code. An efficient routine for performing matrix-vector products was designed and implemented. Numerical tests on a Cray T3E Supercomputer show the high degree of parallelism attainable by the code, and its good scalability level.

Section snippets

AINV and FSAI preconditioners

Let A be a symmetric positive definite N×N matrix.

The approximate inverse preconditioner AINV, which was developed in [2] for linear systems, relies upon the following ideas. One can evaluate A⁻¹ by a biconjugation process applied to an arbitrary set of linearly independent vectors. A convenient choice is the canonical basis $(e_{1},…, e_{N})$ . This process produces a unit upper triangular matrix $Z ̃$ , and a diagonal matrix D such that $A^{−1} = Z ̃ D^{−1} Z ̃^{t}$ . Actually, even with sparse A, the factor $Z ̃$ is usually

Parallel DACG algorithm

Our DACG algorithm sequentially computes the eigenpairs, starting from the leftmost one $λ_{1}, u_{1}$ . To evaluate the jth eigenpair, j>1, DACG minimizes the Rayleigh quotient in a subspace orthogonal to the j−1 eigenvectors previously computed. More precisely, DACG minimizes: $q(z)= z^{t} A z z^{t} B z,$ where $z = x −U_{j} U_{j}^{t} B x, U_{j} = u_{1},…, u_{j−1}, x ∈R^{N} .$ The first eigenpair $λ_{1}, u_{1}$ is obtained by minimization of (1) with $z = x$ (U₁=∅). Let M be a preconditioning matrix. The s leftmost eigenpairs are computed by the following conjugate

Numerical tests

We now report clarifying numerical results obtained applying DACG procedure to a number of finite element, mixed finite element, and finite difference problems. The computations were performed on the T3E 1200 machine of the CINECA computing center, located in Bologna, Italy. The machine is a stand alone system made by a set of DEC-Alpha 21164 processing elements (PEs), performing at a peak rate of 1200 Mflop/s. The PEs are interconnected by a 3D toroidal network having a 480 MByte/s payload

Conclusions

The following points are worth emphasizing.

•
Choosing the nonzero pattern of A when computing FSAI, and setting ε=0.05 in evaluating the AINV factor, allowed for obtaining equally satisfactory preconditioners for our DACG procedure.
•
AINV–DACG and FSAI–DACG displayed comparable speedups. For p=32 processors, FSAI–DACG usually showed a slightly better parallelization level, in some cases better than with Jacobi-DACG, which however confirmed its good parallel performance.
•
The AINV and FSAI techniques

Acknowledgements

This work has been supported in part by the Italian MURST Project “Analisi Numerica: Metodi e Software Matematico”, and CNR contract 98.01022.CT01. Free accounting units on the T3E Supercomputer were given by CINECA, inside a frame research grant. We thank Rich Lehoucq for providing useful suggestions.

References (17)

G Gambolati et al.
An orthogonal accelerated deflation technique for large symmetric eigenproblems
Comp. Methods App. Mech. Eng.
(1992)
F Sartoretto et al.
Accelerated simultaneous iterations for large finite element eigenproblems
J. Comp. Phys.
(1989)
K.J Bathe et al.
Solution methods for eigenvalue problems in structural dynamics
Int. J. Numer. Methods Eng.
(1973)
M Benzi et al.
A sparse approximate inverse preconditioner for the conjugate gradient method
SIAM J. Sci. Comput.
(1996)
L Bergamaschi et al.
Asymptotic convergence of conjugate gradient methods for the partial symmetric eigenproblem
Numer. Lin. Alg. Appl.
(1997)
L Bergamaschi et al.
Approximate inverse preconditioning in the parallel solution of sparse eigenproblems
Numer. Lin. Alg. Appl.
(2000)
L. Bergamaschi, M. Putti, Numerical comparison of iterative methods for the eigensolution of large sparse symmetric...
L. Bergamaschi, M. Putti, Efficient parallelization of preconditioned conjugate gradient schemes for matrices arising...

There are more references available in the full text version of this article.

Cited by (11)

Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters
2003, Parallel Computing
Citation Excerpt :
We perform tests using three matrices. The first matrix, FL3D268, is derived from a finite-element problem [32]. The second matrix, CFD2 [33], is derived from computational fluid dynamics.
Parallel iterative solvers are often the only means of solving large linear systems and eigenproblems. However, these solvers are usually implemented in a fine-grain manner and can incur significant performance penalties due to synchronization overheads on large MPPs. This problem is exacerbated in clusters of workstations (COWs) and SMPs that are interconnected via a hierarchy of networks. In this paper, we describe a novel scheme for hiding the synchronization overheads, and thus improving scalability, of block iterative solvers that employ a correction equation through an inner iterative method.
Block methods are not only robust in the presence of eigenvalue multiplicities and multiple right-hand sides, but provide better latency tolerance by performing more floating-point operations between synchronizations. We take a different approach to inducing latency tolerance by increasing the granularity at which the correction equation is solved for each block vector. This is accomplished by splitting the processors into smaller subgroups which are then used to solve the correction for each block vector concurrently. The rest of the algorithm is still performed in fine grain. We call this combination of fine and coarse-grain parallelism multigrain parallelism.
We implemented a multigrain, block Jacobi–Davidson algorithm for computing the extreme eigenvalues of a symmetric matrix. We obtained improvements of 45–50% over both the block and non-block implementations of the fine-grain method when testing on an IBM SP and on a collection of clusters of Sun workstations.
Computational experience with sequential and parallel, preconditioned Jacobi-Davidson for large, sparse symmetric matrices
2003, Journal of Computational Physics
The Jacobi–Davidson (JD) algorithm was recently proposed for evaluating a number of the eigenvalues of a matrix. JD goes beyond pure Krylov-space techniques; it cleverly expands its search space, by solving the so-called correction equation, thus in principle providing a more powerful method. Preconditioning the Jacobi–Davidson correction equation is mandatory when large, sparse matrices are analyzed. We considered several preconditioners: Classical block-Jacobi, and IC(0), together with approximate inverse (AINV or FSAI) preconditioners. The rationale for using approximate inverse preconditioners is their high parallelization potential, combined with their efficiency in accelerating the iterative solution of the correction equation. Analysis was carried on the sequential performance of preconditioned JD for the spectral decomposition of large, sparse matrices, which originate in the numerical integration of partial differential equations arising in physical and engineering problems. It was found that JD is highly sensitive to preconditioning, and it can display an irregular convergence behavior. We parallelized JD by data-splitting techniques, combining them with techniques to reduce the amount of communication data. Our own parallel, preconditioned code was executed on a dedicated parallel machine, and we present the results of our experiments. Our JD code provides an appreciable parallel degree of computation. Its performance was also compared with those of PARPACK and parallel DACG.
Preconditioning techniques for large linear systems: A survey
2002, Journal of Computational Physics
This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization issues, and block and multilevel extensions. Some of the challenges ahead are also discussed. An extensive bibliography completes the paper.
Parallel Jacobi-Davidson with block FSAI preconditioning and controlled inner iterations
2016, Numerical Linear Algebra with Applications
Efficient parallel solution to large-size sparse eigenproblems with block FSAI preconditioning
2012, Numerical Linear Algebra with Applications
Using the PRIMME eigensolver in materials science applications
2006, Physica Status Solidi (B) Basic Research

View all citing articles on Scopus

View full text

Parallel preconditioning of a sparse eigensolver

Abstract

Introduction

Section snippets

AINV and FSAI preconditioners

Parallel DACG algorithm

Numerical tests

Conclusions

Acknowledgements

Comp. Methods App. Mech. Eng.

J. Comp. Phys.

Solution methods for eigenvalue problems in structural dynamics

Int. J. Numer. Methods Eng.

A sparse approximate inverse preconditioner for the conjugate gradient method

SIAM J. Sci. Comput.

Asymptotic convergence of conjugate gradient methods for the partial symmetric eigenproblem

Numer. Lin. Alg. Appl.

Approximate inverse preconditioning in the parallel solution of sparse eigenproblems

Numer. Lin. Alg. Appl.