Skip to main content
Log in

Implementation and Scalability Analysis of Balancing Domain Decomposition Methods

  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

In this paper we present a detailed description of a high-performance distributed-memory implementation of balancing domain decomposition preconditioning techniques. This coverage provides a pool of implementation hints and considerations that can be very useful for scientists that are willing to tackle large-scale distributed-memory machines using these methods. On the other hand, the paper includes a comprehensive performance and scalability study of the resulting codes when they are applied for the solution of the Poisson problem on a large-scale multicore-based distributed-memory machine with up to 4096 cores. Well-known theoretical results guarantee the optimality (algorithmic scalability) of these preconditioning techniques for weak scaling scenarios, as they are able to keep the condition number of the preconditioned operator bounded by a constant with fixed load per core and increasing number of cores. The experimental study presented in the paper complements this mathematical analysis and answers how far can these methods go in the number of cores and the scale of the problem to still be within reasonable ranges of efficiency on current distributed-memory machines. Besides, for those scenarios where poor scalability is expected, the study precisely identifies, quantifies and justifies which are the main sources of inefficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The spaces \(\widehat{V}\) and V i can also be understood in a functional setting as the global and local spaces of discrete harmonic functions (see [6]).

  2. In a functional setting, functions in \(\widehat{V}\) are uni-valued on Γ. On the contrary, since every node p in Γ h is replicated n(p) times in \(\varPi_{i=1}^{n_{\mathrm{sbd}}}\varGamma_{i}\), functions in V can take different values at different subdomains. As in [42], \(\widehat{\left . \cdot \right . }\) is used to denote uni-valued functions on Γ.

  3. It is based on the observation that SI 0 v 0 for v 0H 0 can readily be obtained as a linear combination of i quantities, that have already been computed when assembling S 0 at the preconditioner set-up. This observation, combined with a slight modification of the PCG recurrence described in [2], spares one Schur complement-vector product per PCG iteration.

  4. This definition of {G a :a=1,…,n cts } generates a unique partition of Γ h .

  5. BDD methods can certainly be reformulated as preconditioners for the global linear system (1). This enables approximate solvers (e.g., AMG [35, 40, 43]) to be used in conjunction with BDD methods. Although this approach relaxes the arithmetic/memory demands of sparse direct solvers (particularly in 3D), it would result in a new source of difficulties (e.g., robustness/complexity trade-off evaluation, parameter tuning, load unbalancing issues, poor flop rates) that deserve further research.

  6. For the BNN method, we assume that the initial value x 0 is such that r 0 is balanced (see [28]).

  7. An explicit assembly of S i is required for some DD preconditioners (see e.g. [17]). Besides, this approach allows one to exploit the dense level 2 BLAS for the Schur complement-vector product, which can only compensate the expensive set-up of S i for non-scalable preconditioners with high iteration counts.

  8. For example, PARDISO is based on the sparse Cholesky factorization without pivoting for symmetric positive definite problems, while for symmetric indefinite problems, it uses a more expensive sparse LDL T factorization which, for numerical stability purposes, combines static (prior-to-factorization) pivoting via symmetric weighted matchings and classical Bunch-Kaufman dynamic (during factorization) pivoting only applied inside the supernodes [36, 37].

References

  1. Badia S, Codina R (2008) Algebraic pressure segregation methods for the incompressible Navier-Stokes equations. Arch Comput Methods Eng 15:343–369

    Article  MathSciNet  MATH  Google Scholar 

  2. Badia S, Martín AF, Príncipe J (2012) Enhanced balancing Neumann-Neumann preconditioning in computational fluid and solid mechanics. Int J Numer Methods Eng (in press)

  3. Baker AH, Gamblin T, Schulz M, Yang UM (2011) Challenges of scaling algebraic multigrid across modern multicore architectures In: Parallel distributed processing symposium (IPDPS). 2011 IEEE international, pp 275–286

    Google Scholar 

  4. Balay S, Brown J, Buschelman K, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Smith BF, Zhang H (2012) PETSc users manual. Technical report ANL-95/11, Argonne National Laboratory

  5. Balay S, Brown J, Buschelman K, Gropp WD, Kaushik D, Kne- pley MG, McInnes LC, Smith BF, Zhang H (2012). PETSc web page. http://www.mcs.anl.gov/petsc

  6. Brenner SC, Scott R (2010) The mathematical theory of finite element methods, 3rd edn. Springer, Berlin

    Google Scholar 

  7. Conceição D, Goldfeld P, Sarkis M (2007) Robust two-level lower-order preconditioners for a higher-order stokes discretization with highly discontinuous viscosities. In: High performance computing for computational science—VECPAR 2006, pp 319–333

    Chapter  Google Scholar 

  8. Davis TA (2006) Direct methods for sparse linear systems, vol 2. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  9. De Roeck YH, Le Tallec P (1991) Analysis and test of a local domain decomposition preconditioner. In: Fourth international symposium on domain decomposition methods for partial differential equations, 112 pp

    Google Scholar 

  10. Dohrmann CR (2003) A preconditioner for substructuring based on constrained energy minimization. SIAM J Sci Comput 25(1):246–258

    Article  MathSciNet  MATH  Google Scholar 

  11. Farhat C, Roux F-X (1991) A method of finite element tearing and interconnecting and its parallel solution algorithm. Int J Numer Methods Eng 32(6):1205–1227

    Article  MATH  Google Scholar 

  12. Farhat C, Pierson K, Lesoinne M (2000) The second generation FETI methods and their application to the parallel solution of large-scale linear and geometrically non-linear structural analysis problems. Comput Methods Appl Mech Eng 184(2–4):333–374

    Article  MATH  Google Scholar 

  13. Ferencz RM, Hughes TJR (1998) Iterative finite element solutions in nonlinear solid mechanics. In: Numerical methods for solids (part 3). Handbook of numerical analysis, vol VI

    Google Scholar 

  14. Filippone S, Buttari A (2012) Object-oriented techniques for sparse matrix computations in Fortran 2003. ACM Trans Math Softw 38(4):23:1–23:20

    Article  Google Scholar 

  15. Filippone S, Colajanni M (2000) PSBLAS: a library for parallel linear algebra computation on sparse matrices. ACM Trans Math Softw 26(4):527–550

    Article  Google Scholar 

  16. George A (1973) Nested dissection of a regular finite element mesh. SIAM J Numer Anal 10(2):345–363

    Article  MathSciNet  MATH  Google Scholar 

  17. Giraud L, Haidar A, Watson LT (2008) Parallel scalability study of hybrid preconditioners in three dimensions. Parallel Comput 34(6–8):363–379

    Article  MathSciNet  Google Scholar 

  18. Glowinski R, Wheeler MF (1988) Domain decomposition and mixed finite element methods for elliptic problems. In: First international symposium on domain decomposition methods for partial differential equations, pp 144–172

    Google Scholar 

  19. Goldfeld P (2003) Balancing Neumann-Neumann preconditioners for the mixed formulation of almost-incompressible linear elasticity. PhD thesis

  20. Goldfeld P, Pavarino LF, Widlund OB (2003) Balancing Neumann-Neumann preconditioners for mixed approximations of heterogeneous problems in linear elasticity. Numer Math 95:283–324

    Article  MathSciNet  MATH  Google Scholar 

  21. Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, vol 1. MIT Press, Cambridge

    Google Scholar 

  22. Gropp W, Lusk E, Thakur R (1999) Using MPI-2: advanced features of the message passing interface. MIT Press, Cambridge

    Google Scholar 

  23. Heroux MA, Willenbring JM (2003) Trilinos users guide. Technical report SAND2003-2952, Sandia National Laboratories

  24. Heroux MA, Bartlett RA, Howle VE, Hoekstra RJ, Hu JJ, Kolda TG, Lehoucq RB, Long KR, Pawlowski RP, Phipps ET, Salinger AG, Thornquist HK, Tuminaro RS, Willenbring JM, Williams A, Stanley KS (2005) An overview of the Trilinos project. ACM Trans Math Softw 31(3):397–423

    Article  MathSciNet  MATH  Google Scholar 

  25. Hoefler T, Traff JL (2009) Sparse collective operations for MPI. In: Proceedings of the 2009 IEEE international symposium on parallel & distributed processing, pp 1–8

    Chapter  Google Scholar 

  26. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  Google Scholar 

  27. Lin PT, Shadid JN, Sala M, Tuminaro RS, Hennigan GL, Hoekstra RJ (2009) Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling. J Comput Phys 228(17):6250–6267

    Article  MathSciNet  MATH  Google Scholar 

  28. Mandel J (1993) Balancing domain decomposition. Commun Numer Methods Eng 9(3):233–241

    Article  MathSciNet  MATH  Google Scholar 

  29. Mandel J, Dohrmann CR (2003) Convergence of a balancing domain decomposition by constraints and energy minimization. Numer Linear Algebra Appl 10(7):639–659

    Article  MathSciNet  MATH  Google Scholar 

  30. Mandel J, Sousedík B, Dohrmann C (2008) Multispace and multilevel BDDC. Computing 83:55–85

    Article  MathSciNet  MATH  Google Scholar 

  31. Saad Y (1995) Data structures and algorithms for domain decomposition and distributed sparse matrix computations. Technical report 95-014, Department of Computer Science, University of Minnesota, Minneapolis, MN

  32. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  33. Saad Y, Sosonkina M (1999) Distributed Schur complement techniques for general sparse linear systems. SIAM J Sci Comput 21(4):1337–1356

    Article  MathSciNet  Google Scholar 

  34. Sahni O, Carothers CD, Shephard MS, Jansen KE (2009) Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference. Sci Program 17(3):261–274

    Google Scholar 

  35. Sala M, Tuminaro R (2008) A new Petrov-Galerkin smoothed aggregation preconditioner for nonsymmetric linear systems. SIAM J Sci Comput 31(1):143–166

    Article  MathSciNet  MATH  Google Scholar 

  36. Schenk O, Gärtner K (2004) Solving unsymmetric sparse systems of linear equations with PARDISO. Future Gener Comput Syst 20(3):475–487

    Article  Google Scholar 

  37. Schenk O, Gärtner K (2006) On fast factorization pivoting methods for sparse symmetric indefinite systems. Electron Trans Numer Anal 23:158–179

    MathSciNet  MATH  Google Scholar 

  38. Šístek J, Sousedík B, Burda P, Mandel J, Novotný J (2011) Application of the parallel BDDC preconditioner to the Stokes flow. Comput Fluids 46(1):429–435

    Article  MathSciNet  MATH  Google Scholar 

  39. Strang G (2006) Linear algebra and its applications. Thomson Brooks/Cole, Belmont

    Google Scholar 

  40. Stüben K (2001) A review of algebraic multigrid. J Comput Appl Math 128(1–2):281–309

    Article  MathSciNet  MATH  Google Scholar 

  41. Tang J, Nabben R, Vuik C, Erlangga Y (2009) Comparison of two-level preconditioners derived from deflation, domain decomposition and multigrid methods. J Sci Comput 39(3):340–370

    Article  MathSciNet  MATH  Google Scholar 

  42. Toselli A, Widlund O (2005) Domain decomposition methods—algorithms and theory. Springer, Berlin

    MATH  Google Scholar 

  43. Vaněk P, Mandel J, Brezina M (1996) Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems. Computing 56:179–196

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work has been funded by the European Research Council under the FP7 Programme Ideas through the Starting Grant No. 258443—COMFUS: Computational Methods for Fusion Technology and the FUSSIM project (Ref. ENE2011-285556), financed by the Spanish Ministry of Science and Innovation (MICINN), under the programme “Proyectos de Investigación fundamental no orientada”. A.F. Martín was also partially funded by the UPC postdoctoral grants under the programme “BKC-5-Atracció i Fidelització de talent al BKC”. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Red Española de Supercomputación and the Juelich Supercomputing Centre in the exploitation of the HPC for Fusion (HPC-FF) under the EFDA HPC Implementing Agreement (EFDA (08) 39/4.1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santiago Badia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Badia, S., Martín, A.F. & Principe, J. Implementation and Scalability Analysis of Balancing Domain Decomposition Methods. Arch Computat Methods Eng 20, 239–262 (2013). https://doi.org/10.1007/s11831-013-9086-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11831-013-9086-4

Keywords

Navigation