Skip to main content
Log in

A Decomposition of the Tikhonov Regularization Functional Oriented to Exploit Hybrid Multilevel Parallelism

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We introduce a decomposition of the Tikhonov Regularization (TR) functional which split this operator into several TR functionals, suitably modified in order to enforce the matching of their solutions. As a consequence, instead of solving one problem we can solve several problems reproducing the initial one at smaller dimensions. Such approach leads to a reduction of the time complexity of the resulting algorithm. Since the subproblems are solved in parallel, this decomposition also leads to a reduction of the overall execution time. Main outcome of the decomposition is that the parallel algorithm is oriented to exploit the highest performance of parallel architectures where concurrency is implemented both at the coarsest and finest levels of granularity. Performance analysis is discussed in terms of the algorithm and software scalability. Validation is performed on a reference parallel architecture made of a distributed memory multiprocessor and a Graphic Processing Unit. Results are presented on the Data Assimilation problem, for oceanographic models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. A partitioned algorithm is a scalar (or point) algorithm in which the operations have been grouped and reordered into matrix operations. A block algorithm is a generalization of a scalar algorithm in which the basic scalar operations become matrix operations, and a matrix property based on the nonzero structure becomes the corresponding property blockwise (LAPACK contains only partitioned algorithms that is, the main computations are block oriented and implemented by using BLAS-3) [15].

References

  1. Antonelli, L., Carracciuolo, L., Ceccarelli, M., D’Amore, L., Murli, A.: Total variation regularization for edge preserving 3D SPECT imaging in high performance computing environments. Lecture Notes in Computer Science, Vol. 2330 LNCS, Issue PART 2, 171–180, (2002)

  2. Arcucci, R., D’Amore, L., Carracciuolo, L.: On the problem-decomposition of scalable 4D-Var Data Assimilation models, International Conference on High Performance Computing and Simulation (HPCS), pp. 589–594. ISBN 978-1-4673-7812-3, (2015)

  3. Campagna, R., D’Amore, L., Murli, A.: An efficient algorithm for regularization of Laplace transform inversion in real case. J. Comput. Appl. Math. 210(1–2), 84–98 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Carracciuolo, L., D’Amore, L., Murli, A.: Towards a parallel component for imaging in PETSc programming environment: a case study in 3-D echocardiography. Parallel Comput. 32(1), 67–83 (2006)

    Article  MathSciNet  Google Scholar 

  5. Chung, J., Nagy, J.G.: An efficient iterative approach for large scale separable nonlinear inverse problems. SIAM J. Sci. Comput. 31(6), 4654–4674 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. D’Amore, L., Mele, V., Laccetti, G., Murli, A.: Mathematical Approach to the Performance Evaluation of Matrix Multiply Algorithm. Lecture Notes in Computer Science, Vol. 9574, pp. 25–34 (2016)

  7. D’Amore, L., Arcucci, R., Carracciuolo, L., Murli, A.: A scalable approach to variational data assimilation. J. Sci. Comput. 2, 239–257 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. D’Amore, L., Arcucci, R., Carracciuolo, L., Murli, A.: DD-oceanvar: a domain decomposition fully parallel data assimilation software in mediterranean sea. Procedia Comp Sci. 18, 1235–1244 (2013)

    Article  Google Scholar 

  9. D’Amore, L., Arcucci, R., Marcellino, L., Murli, A.: HPC computation issues of the incremental 3D variational data assimilation scheme in OceanVar software. J. Numer. Anal. Ind. Appl. Math. 7(3–4), 91–105 (2012)

    MathSciNet  MATH  Google Scholar 

  10. D’Amore, L., Casaburi, D., Galletti, A., Marcellino, L., Murli, A.: Integration of emerging computer technologies for an efficient image sequences analysis. Integr. Comput. Aided Eng. 18(4), 365–378 (2011)

    Google Scholar 

  11. D’Amore, L., Laccetti, G., Romano, D., Scotti, G.: Towards a parallel component in a GPU-CUDA environment: a case study with the L-BFGS Harwell routine. J. Comput. Math. 93(1), 59–76 (2015)

    MATH  Google Scholar 

  12. D’Amore, L., Campagna, R., Galletti, A., Marcellino, L., Murli, A.: A smoothing spline that approximates Laplace transform functions only known on measurements on the real axis. Inverse Probl. 28(2), 37 (2012)

    MathSciNet  MATH  Google Scholar 

  13. D’Amore, L., Murli, A.: Regularization of a fourier series method for the Laplace transform inversion with real data authors of document. Inverse Probl. 18(4), 1185–1205 (2002)

    Article  MATH  Google Scholar 

  14. D’Amore, L., Marcellino, L., Murli, A.: Image sequence inpainting: towards numerical software for detection and removal of local missing data via motion estimation. J. Comput. Appl. Math. 198(2), 396–413 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Demmel, J.W., Higham, N.J., Schreiber, R.: Block LU Factorization. RIACS Technical Report no. 92–03, (1992)

  16. ETP4HPC Agenda, European Technology Platform for High Performance Computing. Strategic research agenda achieving HPC leadership in Europe, (2013)

  17. Flatt, H.P., Kennedy, K.: Performance of parallel processors. Parallel Comput. 12, 1–20 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  18. Freitag, M.A., Nichols, N.K., Budd, C.J.: L1-regularisation for ill-posed problems in variational data assimilation. PAMM Proc. Appl. Math. Mech. 10(1), 665–668 (2010)

    Article  Google Scholar 

  19. Gallopoulos, E., Simoncini, V.: Iterative solution of multiple linear systems: In: Topping B.H.V., Papadrakaki M. (eds.) Theory, practice, parallelism, and applicationsIn Advances in Parallel and Vector Processing for Structural Mechanics, Proc. Second Intl. Conf. Computational Structures Technology, pp. 4751.Civil-Comp Press, Edinburgh (1994)

  20. Hansen, P.C.: Rank Deficient and Discrete Ill-posed Problems. SIAM, Philadelphia (1998)

    Book  Google Scholar 

  21. Kalnay, E.: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  22. Laccetti, G., Lapegna, M., Mele, V., Romano, D., Murli, A.: A double adaptive algorithm for multidimensional integration on multicore based HPC systems. Int. J. of Parallel Progam. 40(4), 397–409 (2012)

    Article  Google Scholar 

  23. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  24. MAGMA Software www.icl.cs.utk.edu/plasma

  25. Murli, A., D’Amore, L., Laccetti, G., Gregoretti, F., Oliva, G.: A multi-grained distributed implementation of the parallel block conjugate gradient algorithm. Concurrency Comput. Pract. Exp. 22(15), 2053–2072 (2010)

    Google Scholar 

  26. Murli, A., Boccia, V., Carracciuolo, L., D’Amore, L., Laccetti, G., Lapegna, M.: Monitoring and migration of a PETSc-based parallel application for medical imaging in a grid computing PSE. IFIP Int. Federation Inf. Process. 239, 421–432 (2007)

    Article  Google Scholar 

  27. Nichols, N.K.: Mathematical concepts of data assimilation. In: Lahoz, W., Khattatov, B., Menard, R. (eds.) Data assimilation: Making Sense of Observations, pp. 13–40. Springer, Berlin (2010)

    Chapter  Google Scholar 

  28. Nvidia, TESLA K20 GPU Active Accelerator (2012). Board spec. Available: http://www.nvidia.in/content/PDF/kepler/Tesla-K20-Active-BD-06499-001-v02

  29. O’Leary, D.P., Simmons, J.A.: A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems. SIAM J. Sci. Stat. Comput. 2, 474489 (1981)

    MathSciNet  MATH  Google Scholar 

  30. Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8, 4371 (1982)

    MathSciNet  MATH  Google Scholar 

  31. pARMS Software www-users.cs.umn.edu/saad/software/parms

  32. PCIsig, tecnology specifications at http://pcisig.com/specifications/pciexpress/

  33. PETSC Software www.mcs.anl.gov/petsc

  34. PLASMA Software www.icl.cs.utk.edu/plasma

  35. Reichel, L., Baglamana, J.: Decomposition methods for large linear discrete ill-posed problems. J. Comput. Appl. Math 198(2), 333–343 (2007)

    Article  MathSciNet  Google Scholar 

  36. Tikhonov, A.N., Solution of incorrectly formulated problems and the regularization method, Dokl. Akad. Nauk. SSSR 151, : 501504 = Soviet Math. Dokl. 4(1963), 1035–1038 (1963)

Download references

Acknowledgments

This work was developed within the research activity of the H2020-MSCA-RISE-2016 NASDAC Project N. 691184. This work has been realized thanks to the use of the S.Co.P.E. computing infrastructure at the University of Naples.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luisa D’Amore.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arcucci, R., D’Amore, L., Carracciuolo, L. et al. A Decomposition of the Tikhonov Regularization Functional Oriented to Exploit Hybrid Multilevel Parallelism. Int J Parallel Prog 45, 1214–1235 (2017). https://doi.org/10.1007/s10766-016-0460-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0460-3

Keywords

Navigation