Skip to content
Publicly Available Published by De Gruyter February 16, 2016

Preconditioners for hierarchical matrices based on their extended sparse form

  • Darya A. Sushnikova EMAIL logo and Ivan V. Oseledets

Abstract

In this paper we consider linear systems with dense-matrices which arise from numerical solution of boundary integral equations. Such matrices can be well-approximated with ℋ2-matrices. We propose several new preconditioners for such matrices that are based on the equivalent sparse extended form of ℋ2-matrices. In the numerical experiments we show that the most efficient approach is based on the so-called reverse-Schur preconditioning technique.

MSC 2010: 65R20; 65F08; 78M16

Many physical problems in acoustics, electrostatics [21] and some other [2] areas lead to boundary and volume integral equations with non-local kernels. Numerical solution of integral equations is challenging, since even the computation of all the matrix elements is often impossible for the problems of interest. Fortunately, the matrices, arising from the discretization of integral equations can be approximated well with data-sparse matrices. Among the most important approaches are hierarchical matrices (ℋ-matrices) [5], mosaic-skeleton method [22], and hierarchical semiseparable matrices (HSS-matrices) [6, 15, 20]. All of those classes of matrices correspond to the idea of block low-rank approximation and have their roots in the classical fast multipole method (FMM) [10, 11, 18]. In this paper we consider linear systems with ℋ2-matrices [4, 12]. An ℋ2-matrix can be multiplied by a vector fast, thus iterative methods can be used to solve linear systems, but for efficient solution of a linear system it is not enough. If matrix is ill-conditioned we have to use preconditioners. A very efficient approach is based on the approximate factorization of hierarchical matrices [3]. Algorithms with almost linear complexity have been proposed and successfully applied to many problems. The disadvantage of these methods is that the prefactor can be quite large, and the memory required to store the L and U factors can also be large. In the recent years several approaches have been proposed for fast direct methods with HSS matrices. HSS matrix is equivalent to the ℋ2-matrix corresponding to one-dimensional geometry, thus this structure is not fully suited for solving surface and volume integral equations, i.e. it can not give optimal complexity. Nevertheless, the actual computing times can be amazing [9, 14, 23].

In this paper we use the observation that the classical three-step matrix-by-vector product procedure can be rewritten as a big linear system (which we call sparse extended form or simply SE-form of the ℋ2-matrix). Very similar ideas were presented in [2, 7, 8, 14]. In paper [1] SE form is used for building direct solver for system with ℋ2-matrix. We propose a number of new methods for preconditioning systems with ℋ2-matrix, based on idea of SE-form. For small problem sizes (say, N ∼ 20 000) this gives an easy-to-implement approach for the solution of a given linear system with an ℋ2-matrix. We have found that the memory requirements and the computational cost grow very fast. Therefore we propose several alternatives, which use SE-form as an auxiliary step for the creation of efficient preconditioners for the initial matrix. We numerically confirm the effectiveness of the proposed preconditioners on two surface integral equations. The code is available online as a part of the open-source Python package h2tools [17].

1 Notations and basic facts

In this section we recall basic definitions and notations for working with ℋ2-matrices. This material is not new and can be found, for example in [4]. Let us consider a matrix A ∈ ℝm×n. First, we need several definitions.

Definition 1.1.

Block cluster tree 𝒯r (𝒯c) of rows (columns) of matrix A is a tree where:

  1. each node 𝒯rt(𝒯cs) is associated with some group of rows (columns);

  2. root node 𝒯r0(𝒯c0) contains all rows (columns);

  3. if some group of rows (columns) is divided into subgroups, then the corresponding node has child nodes associated with those subgroups.

Definition 1.2.

Let 𝒯c and 𝒯r be block cluster trees of columns and rows of the matrix A. Each pair p = (v, w) where v ∈ 𝒯r, w ∈ 𝒯c represents some block in matrix A. We assume that there is some rule that divides the set of blocks into two classes, namely close blocks and far blocks. The nodes of trees i ∈ 𝒯c and j ∈ 𝒯r are close (far) if the pair p = (i, j) is close (far). If some node j of one tree is close to all children of some node i of another tree then the node j is close to the node i. Denote by 𝒫close (𝒫far) the set of all close (far) pairs p = (i, j), i ∈ 𝒯r, j ∈ 𝒯c

Definition 1.3

(cluster basis [4, p. 54]). Let K = (Ki)i∈𝒯I be a family of finite index sets. Let R = (Ri)i∈𝒯r be a family of matrices satisfying condition Ri ∈ ℝI×Kt for all i ∈ 𝒯r. Then R is called a cluster basis with rank distribution K and the matrices Ri are called cluster basis matrices.

The ℋ2-matrix structure is related to the nestedness property.

Definition 1.4

(nested cluster basis [4, p. 54]). Let R be a cluster basis with rank distribution K. R is called nested if there is a family (Ei)i∈𝒯r of matrices satisfying the following conditions:

  1. for all i ∈ 𝒯r and all i′ ∈ sons(i), we have Ei′ ∈ ℝKi′×Ki;

  2. for all i ∈ 𝒯r with sons(i) ≠ ∅, the following equation holds:

    (1.1)Ri=isons(i)RiEi.

The matrices Et are called transfer matrices or expansion matrices.

We can consider matrix A as a sum of two matrices: A = C + F, where C is constructed from close blocks p ∈ 𝒫close and F is constructed from far blocks p ∈ 𝒫far. Now we can define an ℋ2-matrix:

Definition 1.5

(ℋ2-representation [4, p. 56]). Let A ∈ ℝI×J. Let 𝒯c and 𝒯r be block cluster trees of columns and rows of matrix A. R is a nested cluster basis of rows and L is a nested cluster basis of columns. If there is a family S = (Sp)p∈𝒫far of matrices, for all p = (i, j) ∈ 𝒫far then A is an ℋ2-matrix and Figure 1 shows close blocks (black), far blocks (white) and an example of block row (shaded). The block row consists of columns of the blocks that are separated from the node i.

(1.2)A=F+C=(i,j)𝒫farRiSpLj*+(i,j)𝒫closeCp.
Figure 1. Illustration of a matrix with ℋ2-structure.
Figure 1.

Illustration of a matrix with ℋ2-structure.

The construction of cluster trees, block cluster nested trees, could be done using the standard procedure (see, for example, [19]). The crucial task here is to compute the cluster basis. In our numerical experiments we have used the method proposed in [16], however other techniques maybe used as well. Summarizing, an ℋ2-matrix A is defined by cluster trees 𝒯c and 𝒯r, and the following sets of matrices R = (Ri), i ∈ 𝒯r, C = (Cp), p = (i, j) ∈ ℙclose, i ∈ 𝒯r, j ∈ 𝒯c, S = (Sp), p = (i, j) ∈ ℙfar, i ∈ 𝒯r, j ∈ 𝒯c, L = (Lj), j ∈ 𝒯c,

1.1 Matrix-vector multiplication

Matrix-by-vector multiplication algorithm for the ℋ2-matrix [4, p. 59-63] plays the key role in this paper. Its formal description is given in Algorithm 3. It is convenient to distribute the vector x over the nodes of the row cluster tree: x = (xi) i ∈ leaves(𝒯r). The resulting vector y = Ax is also computed in the form y = (yj) j ∈ leaves(𝒯c). We note that in the algorithm we use there is an additional operation that transfers x to the leaves of the tree 𝒯r with the help of the matrices D = (Di), i ∈ leaves(𝒯r), and E = (Ej), j ∈ leaves(𝒯c).

Algorithm 1. Forward transformation.
Algorithm 1.

Forward transformation.

Algorithm 2. Backward transformation.
Algorithm 2.

Backward transformation.

Algorithm 3. Matrix–vector multiplication.
Algorithm 3.

Matrix–vector multiplication.

We give a graphical illustration of the matrix–vector product procedure in Fig. 2. The complexity is 𝒪(N) and storage complexity is also 𝒪(N).

Figure 2. Illustration of the Algorithm of matrix–vector product.
Figure 2.

Illustration of the Algorithm of matrix–vector product.

2 Sparse extended form: the main idea

Our main observation is that Algorithm 3 can be rewritten as a multiplication of a sparse matrix by vector. The first step of Algorithm 3 can be rewritten as a matrix–vector product Dx=xl, where

(2.1)D=[D10000D200000000DNlr]

where Nlr is number of leaf nodes in tree 𝒯r, zeros represent zero matrices of appropriate sizes. We can rewrite Step 2 as Rx=xn, where

(2.2)R=[R10000R200000000RNnr],

where Nnr is the number of non-leaf nodes in the tree 𝒯r zeros are zero matrices of appropriate sizes. Step 3 of Algorithm 3 corresponds to equation Lyn+Sx=y, where

(2.3)L=[L10000L200000000LNlc],S=[S11S12S1NcS21S22S2NcSNc1SNc2SNcNc]

where Nc is the number of nodes in tree 𝒯c, Nlc is the number of leaf nodes in tree 𝒯c, Sij = 0 if i ∈ 𝒯cj ∈ 𝒯r (i, j) ∈ 𝒫far. The final step we rewrite as y = Eyl + Cx, where

(2.4)E=[E10000E200000000ENlc].

Putting this all together we get

(2.5){Dx=xlRx=xnLyn+Sx=yEyl+Cx=y

or in the block form:

(2.6)[C00E0SL00R00D000][xxynyl]=[yyxnxl].

We denote the obtained matrix by H0, and also introduce a block vectors

x=[xnxl],y=[ynyl].

Finally, we get

(2.7)H0[xxy]=[yyx].

We recall that our goal is given y compute x, therefore

(H0+[0N×N0000INy×Ny0INx×Nx0])[xxy]=[y00]

where Nx=len(x),Ny=len(y) and

H=H0+[00000I0I0].

The final system of equations has the form

(2.8)H0[xxy]=[y00].

Now the right-hand side of system (2.8) contains only known values and we can solve it and find x, where x is the solution of Ax = y. We call the matrix H ‘sparse extended form’, or SE-form, of the ℋ2-matrix A.

3 Properties of SE-form

An important property of the SE-form is that if A is non-singular, the SE-form of A is non-singular as well.

Theorem 3.1.

If a matrix A ∈ ℝN×Nis a nonsingular2-matrix, then H = SE(A) ∈ ℝNH×NHis nonsingular, and NH < (2k + 1)N, where k is the maximum of numbers of levels of the cluster trees 𝒯cand 𝒯r.

Proof. First let us prove that the matrix H is square. The matrix A is square, therefore len(x) = len(y) = N. Let Nr be the number of rows in H and Nc be the number of columns in it. Then,

Nr=len(x)+len(x)+len(y)=len(y)+len(y)+len(x)=Nc.

Thus H is a square matrix. Note len(x)k1N,len(y)k2N , where k1 and k2 are the numbers of levels of the trees 𝒯c and 𝒯r. Therefore

NH=len(x)+len(x)+len(y)=N+k1N+k2N<(2k+1)N

Now we prove that H is nonsingular. We suppose that Hz = 0 and let us prove that it implies z = 0. Due to the construction of the SE-form, the first block component of the vector z satisfies Ax = 0, thus x = 0. According to Algorithm 3 and (2.5) x = 0 leads to x=0 and y=0 therefore z = 0 and the kernel of H is trivial. ◻

Note, that the condition number of the SE-form is typically much larger than of the original matrix, so special preconditioning is need

4 Solvers based on SE-form

This section is devoted to the question on how to use the SE-form of the matrix for the solution of linear systems. We propose several methods, which are listed below.

  1. (direct solver) apply sparse direct solvers to SE(A) and given y compute x.

  2. (preconditioning (2.8) with matrix SE(A)) construct preconditioner to SE(A) based on the block structure.

  3. (iterative solvers for systems with A using SE(A) as a preconditioner) SE-form can be used as preconditioner for the original system. To solve the correction equation we apply several steps of some solver for SE(A).

Now we describe them in more details.

4.1 Method 1

Applying any fast sparse direct solver to SE(A) is natural idea. However, it is possible for small N, but the amount of memory required for the such solver grows very quickly. The advantage of this method is that it is very simple to implement. Once a sparse representation of the SE(A) is given, we only need to call the procedure. Another approach is to compute some preconditioner for SE(A). In our numerical experiments we have tested ILUT preconditioner. We call this approach SE-ILUT preconditioner.

4.2 Method 2

Now we consider system (2.8) with sparse extended matrix SE(A) and find solution of this system iteratively. In numerical experiments we found that SE(A) always have very large condition numbers. Thus, a preconditioner is needed. A natural way is to use the block structure of SE(A) to construct the preconditioner. We propose a SE-block preconditioner. We compute an approximate inverse of the ‘far block’ of SE(A) and all others are replaced by identity matrices:

(4.1)B=[I000P(S)000I]

where P(S) is some preconditioner for the block S. We note, that block S can be rectangular, in this case we construct preconditioner for the smallest square block that contains S. In experiments we have seen that the ill-conditioning of S is the reason why H is ill-conditioned.

4.3 Method 3

The matrix A can be considered as a Schur complement of H = SE(A) with components related to x and y eliminated. Typically, Schur complement is used as a preconditioner; here we use reverse Schur component preconditioning (similar ideas were used in [2]). In this method, Schur complement is used in the opposite fashion: we solve the system with a matrix A and to solve the correction equation we go to the large system (extend the residual), apply several steps of some preconditioned iterative solver for SE(A) and extract the required vector as a corresponding component of the result. This approach appears to be the most effective one for the problems we have considered. Moreover, additional speedup can be obtained by using recompression of the ℋ2-matrix, i.e.

AB

where B has lower ranks in the rank distribution. The description of the robust algorithm based on SVD can be found in [4], and it is implemented in the h2tools package as well. Then we use SE(B) instead of SE(A) in the algorithm, so this method is not a ‘reverse Schur complement’, but ‘approximate reverse Schur complement’ method. We denote this approach SVD-SE. The final algorithm is summarized in Algorithm 4.

Algorithm 4. SVD-SE method.
Algorithm 4.

SVD-SE method.

5 Numerical experiments

5.1 Electrostatic problem

As a model problem we consider boundary integral equation of the form

(5.1)Ωq(y)xydy=f(x),xΩ

where Ω is [0, 1]2. The function f(x) is given and q(y) is to be found. Equation (5.1) is discretized using collocation method with piece-wise constant basis functions on the triangular mesh ΩN with N triangles (see Fig. 4). The matrix elements can be evaluated analytically [13]. The matrix A is then approximated in the ℋ2-format using the h2tools package [17]. The SE-form of A is presented in Fig. 3 for N = 1196.

Figure 3. SE-form of matrix A.
Figure 3.

SE-form of matrix A.

Figure 4. Triangular mesh ΩN.
Figure 4.

Triangular mesh ΩN.

5.1.1 Method 1

In Table 1 the results of Method 1 (sparse direct solver applied to SE(A)) are presented. As it is readily seen, the memory quickly becomes a bottleneck.

Table 1.

Timings and memory for Method 1.

Ntime, (s)Mem, (MB)
39282.585376.85
1364037.762527.7
280802345012.1
59428
98936

5.1.2 Method 2

The convergence of GMRES method with SE-ILUt and block preconditioners is presented in Fig. 5. The number of iterations with the SE-ILUT preconditioner is significantly smaller than the number of iterations with the block preconditioner, however, the computation of the block preconditioner is much less expensive. This is illustrated in Table 2.

Figure 5. Method 2: GMRES convergence with different preconditioners.
Figure 5.

Method 2: GMRES convergence with different preconditioners.

Table 2.

Timing for building the preconditioner and solving the system using the GMRES method.

NBlock prec + GMRES, (s)SE-ILUt prec + GMRES, (s)
39280.085 + 1.281.75 + 0.17
136400.23 + 5.69.17 + 0.52
280800.53 + 11.827.17 + 0.91
594281.34 + 34.875.02+ 3.13
989363.28 + 59.13134.11+ 10.2

5.1.3 Method 3

The convergence of the GMRES method with reverse-Schur preconditioning is presented in Fig. 6 for N = 28080. The total computational cost for different reverse-Schur preconditioner is given in Table 3. It should be noted that the SE-ILUt preconditioner in the second column is the same as in the previous section, however it is more effective to use it as a reverse-Schur preconditioner for the original system, than for the full SE-form. The compression of the ℋ2-form of the matrix yields the best preconditioner by a factor of 4.

Figure 6. Method 3: Convergence of GMRES for N = 28080.
Figure 6.

Method 3: Convergence of GMRES for N = 28080.

Table 3.

Timings for the Method 3 with different reverse-Schur preconditioners.

NSE-ILUt prec + GMRESSVD-SE prec + GMRES, (s)
39282.13 + 0.030.21 + 0.11
136408.84 + 0.111.34 + 1.32
2808024.8 + 0.358.35 + 2.94
5942869.1 + 1.3319.7 + 6.13
98936150.2 + 4.3840.01 + 15.03

5.1.4 Final comparison

In Table 4 we present final comparison for all methods for different N. For Methods 2 and 3 the solution time is the total computational time (building the preconditioner and solving the system using GMRES).

Table 4.

Timings for all methods for different N.

Method 1 Dir. sol., (s)Method 2Method 3
NDir. sol., (s)Block, (s)SE-ILUt, (s)SE-ILUt, (s)SVD-SE, (s)
39282.5853.2152.112.160.87
1364037.7610.839.698.954.65
2808023422.3328.0825.4716.92
5942853.1478.1570.3742.01
98936122,41144.31127.9589.94

The memory is an important constraint, and in Table 5 we present the memory required for each of the methods for different N. Method 3 with SVD recompression is the fastest method and requires much less memory. The comparison of different methods is shown in Table 6.

Table 5.

Memory requirements for all the methods for different N, missing entries mean ‘out of memory’.

Method 1Method 2Method 3
NDir. sol., (MB)Block, (MB)SE-ILUt, (MB)SE-ILUt, (MB)SVD-SE, (MB)
392842424.231.415.7< 10
136402606251.2279.831.4< 10
2808014702675.32464.9219.847.1
5942813019720.6810128.3
989366340.12028.8221.5
Table 6.

Timing and memory for all methods for problem (5.2).

Method 1Method 2Method 3
Dir. solBlockSE-ILUtSE-ILUtSVD-SE
memory, (M)28.331.731.717.3
time, (s)8.2813.4

5.2 Hypersingular integral equation

The second problem is the hypersingular integral equation

(5.2)Ωq(y)xy3dy=f(x)

where Ω is shown in Fig. 8. The equation is discretized using the collocation method with piecewise-constant basis functions (also known as discrete vortices method), and the approximation in the ℋ2-format is done using the h2tools package as well. The SE-form of A is given in Fig. 7. The mesh has N = 28340 triangles.

Figure 7. SE-form of matrix A.
Figure 7.

SE-form of matrix A.

Figure 8. Surface ΩN.
Figure 8.

Surface ΩN.

6 Conclusions

The new SE-form of the ℋ2-matrix allows for different ways for the construction of effective linear systems solvers with such matrices. Numerical experiments show that the most effective way is to use the reverse-Schur preconditioner with SVD-recompression of the ℋ2-form of A. The implementation of the methods is available as a part of the open-source package h2tools [17], and the numerical experiments are available as IPython notebooks.

Funding statement: Funding: This work was supported by the Russian Science Foundation grant 14-11-00659.

References

[1] S. Ambikasaran and E. Darve, The inverse fast multipole method. arXiv preprint, (2014).Search in Google Scholar

[2] J. Bardhan, M. Altman, B. Tidor, and J. White, ‘Reverse-Schur’ approach to Optimization with Linear PDE constraints. Application to biomolecule analysis and design. J. Chem. Theory Comput.5 (2009), No. 12, 3260–3278.10.1021/ct9001174Search in Google Scholar

[3] M. Bebendorf, Hierarchical LU decomposition-based preconditioners for BEM. Comput.74 (2005), No. 3, 225–247.10.1007/s00607-004-0099-6Search in Google Scholar

[4] S. Börm, Efficient Numerical Methods for Non-Local Operators: H2-matrix Compression, Algorithms and Analysis. European Mathematical Society, 2010.10.4171/091Search in Google Scholar

[5] S. Börm, L. Grasedyck, and W. Hackbusch, Introduction to hierarchical matrices with applications, Eng. Anal. Bound Elem.27 (2003), No. 5, 405–422.10.1016/S0955-7997(02)00152-2Search in Google Scholar

[6] S. Chandrasekaran, M. Gu, and W. Lyons, A fast adaptive solver for hierarchically semiseparable representations. Calcolo42 (2005), No. 3–4, 171–185.10.1007/s10092-005-0103-3Search in Google Scholar

[7] S. Chandrasekaran, P. Dewilde, M. Gu, W. Lyons, and T. Pals, A fast solver for HSS representations via sparse matrices. SIAM J. Matrix Anal. Appl.29 (2006), No. 1, 67–81.10.1137/050639028Search in Google Scholar

[8] S. Chandrasekaran, M. Gu, and T. Pals, A fast ULV decomposition solver for hierarchically semiseparable representations. SIAM J. Matrix Anal. A.28 (2006), No. 3, 603–622.10.1137/S0895479803436652Search in Google Scholar

[9] E. Corona, P.-G. Martinsson, and D. Zorin, An O(N) direct solver for integral equations on the plane. Appl. Comput. Harmon. A38 (2015), No. 2, 284–317.10.1016/j.acha.2014.04.002Search in Google Scholar

[10] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations. J. Comput. Phys.73 (1987), No. 2, 325–348.10.1016/0021-9991(87)90140-9Search in Google Scholar

[11] L. Greengard and V. Rokhlin, The Rapid Evaluation of Potential Fields in Three Dimensions. Springer, Berlin, 1988.10.1007/BFb0089775Search in Google Scholar

[12] W. Hackbusch and S. Borm, H2-matrix approximation of integral operators by interpolation. Appl. Numer. Math.43 (2002), No. 1, 129–143.10.1016/S0168-9274(02)00121-6Search in Google Scholar

[13] J. L. Hess and A. M. Smith, Calculation of non-lifting potential flow about arbitrary three-dimensional bodies. DTIC Document, 1962.Search in Google Scholar

[14] K. Ho and L. Greengard, A fast direct solver for structured linear systems by recursive skeletonization. SIAM J. Sci. Comput.34 (2012), No. 5, A2507–A2532.10.1137/120866683Search in Google Scholar

[15] P. G. Martinsson, A fast randomized algorithm for computing a hierarchically semiseparable representation of a matrix. SIAM J. Matrix Anal. A.32 (2011), No. 4, 1251–1274.10.1137/100786617Search in Google Scholar

[16] A. Mikhalev and I. V. Oseledets, Adaptive nested cross approximation of non-local operators. arXiv preprint No. 1407.1572, (2013).Search in Google Scholar

[17] A. Yu. Mikhalev, (2013), https://bitbucket.org/muxas/h2toolsSearch in Google Scholar

[18] V. Rokhlin, Rapid solution of integral equations of classical potential theory. J. Comput. Phys.60 (1985), No. 2, 187–207.10.1016/0021-9991(85)90002-6Search in Google Scholar

[19] H. Samet, The quadtree and related hierarchical data structures. ACM Comput. Surv.16 (1984), No. 2, 187–260.10.1145/356924.356930Search in Google Scholar

[20] Z. Sheng, P. Dewilde, and S. Chandrasekaran, Algorithms to solve hierarchically semi-separable systems. In: System Theory, the Schur Algorithm and Multidimensional Analysis. Birkhäuser, Basel, 2007.Search in Google Scholar

[21] D. Stephen, R. M. Harris, B. P. Johnson, S. Kim, K. Nabors, M. A. Shulman, and J. K. White, A computer-aided design system for micro-electromechanical systems (MEMCAD). IEEE, Mater. Res. Soc. Symp. P.1 (1992), No. 1, 3–13.Search in Google Scholar

[22] E. E. Tyrtyshnikov, Mosaic-skeleton approximations, Calcolo33 (1996), No. 1, 47–57.10.1007/BF02575706Search in Google Scholar

[23] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Superfast multifrontal method for large structured linear systems of equations. SIAM J. Matrix Anal. A.31 (2009), No. 3, 1382–1411.10.1137/09074543XSearch in Google Scholar

Received: 2015-9-28
Accepted: 2015-11-24
Published Online: 2016-2-16
Published in Print: 2016-2-1

© 2016 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 22.5.2024 from https://www.degruyter.com/document/doi/10.1515/rnam-2016-0003/html
Scroll to top button