A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

Martínez, Millán A.; Fraguela, Basilio B.; Cabaleiro, José C.

doi:10.1007/s10766-021-00709-y

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

Published: 14 May 2021

Volume 49, pages 820–845, (2021)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

376 Accesses
3 Citations
Explore all metrics

Abstract

The Divide-and-conquer (D&C) pattern appears in a large number of problems and is highly suitable to exploit parallelism. This has led to much research on its easy and efficient application both in shared and distributed memory parallel systems. One of the most successful approaches explored in this area consists of expressing this pattern by means of parallel skeletons which automate and hide the complexity of the parallelization from the user while trying to provide good performance. In this paper, we tackle the development of a skeleton oriented to the efficient parallel resolution of D&C problems with a high degree of imbalance among the subproblems generated and/or a deep level of recurrence. The skeleton achieves in our experiments average speedups between 11 and 18% higher than those of other solutions, reaching a maximum speedup of 78% in some tests. Nevertheless, the new proposal requires an average of between 13 and 29% less programming effort than the usual alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

A new distributed graph coloring algorithm for large graphs

Article 23 March 2023

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

Article 12 June 2023

References

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston (1974)
MATH Google Scholar
Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: FastFlow: High-Level and Efficient Streaming on Multicore, chap. 13, pp. 261–280. Wiley (2017)
Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611–626 (2003)
Article Google Scholar
Ansel, J., Chan, C., Wong, Y.L., Olszewski, M., Zhao, Q., Edelman, A., Amarasinghe, S.: PetaBricks: A language and compiler for algorithmic choice. In: Proceedings of 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pp. 38–49. ACM (2009)
Avis, D., Fukuda, K.: Reverse search for enumeration. Discrete Appl. Math. 65(1), 21–46 (1996)
Article MathSciNet Google Scholar
Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s data parallel skeletons for multi-core computer architectures. In: 12th IEEE Intl. Conf. on High Performance Computing and Communications, (HPCC 2010), pp. 108–113. IEEE (2010)
Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge (1989)
MATH Google Scholar
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Article Google Scholar
Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In: Proc. 3rd Intl. Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10–19. ACM (2016)
Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proc. Conf. on High Performance Computing Networking, Storage and Analysis, SC ’09. ACM, New York, NY, USA (2009)
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In: 2009 International Conference on Parallel Processing, pp. 124–131 (2009)
Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7–8), 604–615 (2006)
Article Google Scholar
González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In: Proc. 12th IEEE Intl. Conf. on High Performance Computing and Communications, (HPCC 2010), pp. 79–88. IEEE (2010)
González, C.H., Fraguela, B.B.: A general and efficient divide-and-conquer algorithm framework for multi-core clusters. Cluster Comput. 20(3), 2605–2626 (2017)
Article Google Scholar
Gorlatch, S., Cole, M.: Parallel skeletons. In: Encyclopedia of Parallel Computing, pp. 1417–1422. Springer (2011)
Halstead, M.H.: Elements of Software Science. Elsevier, Amsterdam (1977)
MATH Google Scholar
Hosseini Rad, M., Patooghy, A., Fazeli, M.: An efficient programming skeleton for clusters of multi-core processors. Int. J. Parallel Program. 46, 1094–1109 (2018)
Article Google Scholar
von Koch, T.J.K.E., Manilov, S., Vasiladiotis, C., Cole, M., Franke, B.: Towards a compiler analysis for parallel algorithmic skeletons. In: Proc. 27th Intl. Conf. on Compiler Construction, CC 2018, pp. 174–184 (2018)
Kozsik, T., Tóth, M., Bozó, I.D.: Free the conqueror! refactoring divide-and-conquer functions. Future Gener. Comput. Syst. 79(P2), 687–699 (2018)
Article Google Scholar
Leyton, M., Piquer, J.M.: Skandium: multi-core programming with algorithmic skeletons. In: Proc. 18th Euromicro Conf. on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289–296. IEEE (2010)
Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. In: Proc. 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIII, pp. 287-296. Association for Computing Machinery, New York, NY, USA (2008)
Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston (2004)
MATH Google Scholar
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2, 308–320 (1976)
Article MathSciNet Google Scholar
Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.W.: UTS: An unbalanced tree search benchmark. In: Languages and Compilers for Parallel Computing (LCPC 2006), pp. 235–250. Springer Berlin Heidelberg (2006)
OpenMP Architecture Review Board: OpenMP application program interface version 5.0 (2018)
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol (2007)
Google Scholar
Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In: Proc. 3rd ACM Symp. on Parallel Algorithms and Architectures, (SPAA’91), pp. 237–245. ACM (1991)
Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In: Proc. Third Conf. on Partitioned Global Address Space Programing Models, PGAS ’09, pp. 10:1–10:7. ACM (2009)
Thoman, P., Jordan, H., Fahringer, T.: Adaptive granularity control in task parallel programs using multiversioning. In: Euro-Par 2013 Parallel Processing, pp. 164–177. Springer, Berlin, Heidelberg (2013)
van Dijk, T., van de Pol, J.C.: Lace: Non-blocking split deque for work-stealing. In: Euro-Par 2014: Parallel Processing Workshops, pp. 206–217. Springer (2014)
White, J.L.: Reverse search for enumeration—applications. http://cgm.cs.mcgill.ca/~avis/doc/rs/applications/index.html. Accessed 6 Mar 2021 (2008)
Yang, J., He, Q.: Scheduling parallel computations by work stealing: a survey. Int. J. Parallel Program. 46(2), 173–197 (2018)
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Ministry of Science and Innovation of Spain (TIN2016-75845-P, PID2019-104184RB-I00 and PID2019-104834GB-I00, AEI/FEDER/EU, 10.13039/501100011033) and the predoctoral Grant of Millán Álvarez Ref. BES-2017-081320), and by the Xunta de Galicia co-founded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2017/04 and ED431C 2018/19). We acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC” and the Centro Singular de Investigación en Tecnoloxías Intelixentes “CiTIUS”, funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program), by grants ED431G 2019/01 and ED431G 2019/04. We also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of their computers.

Author information

Authors and Affiliations

Computer Architecture Group, CITIC, Universidade da Coruña, 15071, A Coruña, Spain
Millán A. Martínez & Basilio B. Fraguela
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Spain
José C. Cabaleiro

Authors

Millán A. Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Basilio B. Fraguela
View author publications
You can also search for this author in PubMed Google Scholar
José C. Cabaleiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Millán A. Martínez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martínez, M.A., Fraguela, B.B. & Cabaleiro, J.C. A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems. Int J Parallel Prog 49, 820–845 (2021). https://doi.org/10.1007/s10766-021-00709-y

Download citation

Received: 13 November 2020
Accepted: 16 March 2021
Published: 14 May 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10766-021-00709-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

Abstract

Access this article

Similar content being viewed by others

Parallelizing the dual revised simplex method

A new distributed graph coloring algorithm for large graphs

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

Abstract

Access this article

Similar content being viewed by others

Parallelizing the dual revised simplex method

A new distributed graph coloring algorithm for large graphs

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation