Skip to main content
Log in

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The Divide-and-conquer (D&C) pattern appears in a large number of problems and is highly suitable to exploit parallelism. This has led to much research on its easy and efficient application both in shared and distributed memory parallel systems. One of the most successful approaches explored in this area consists of expressing this pattern by means of parallel skeletons which automate and hide the complexity of the parallelization from the user while trying to provide good performance. In this paper, we tackle the development of a skeleton oriented to the efficient parallel resolution of D&C problems with a high degree of imbalance among the subproblems generated and/or a deep level of recurrence. The skeleton achieves in our experiments average speedups between 11 and 18% higher than those of other solutions, reaching a maximum speedup of 78% in some tests. Nevertheless, the new proposal requires an average of between 13 and 29% less programming effort than the usual alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston (1974)

    MATH  Google Scholar 

  2. Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: FastFlow: High-Level and Efficient Streaming on Multicore, chap. 13, pp. 261–280. Wiley (2017)

  3. Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611–626 (2003)

    Article  Google Scholar 

  4. Ansel, J., Chan, C., Wong, Y.L., Olszewski, M., Zhao, Q., Edelman, A., Amarasinghe, S.: PetaBricks: A language and compiler for algorithmic choice. In: Proceedings of 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pp. 38–49. ACM (2009)

  5. Avis, D., Fukuda, K.: Reverse search for enumeration. Discrete Appl. Math. 65(1), 21–46 (1996)

    Article  MathSciNet  Google Scholar 

  6. Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s data parallel skeletons for multi-core computer architectures. In: 12th IEEE Intl. Conf. on High Performance Computing and Communications, (HPCC 2010), pp. 108–113. IEEE (2010)

  7. Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge (1989)

    MATH  Google Scholar 

  8. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)

    Article  Google Scholar 

  9. Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In: Proc. 3rd Intl. Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10–19. ACM (2016)

  10. Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proc. Conf. on High Performance Computing Networking, Storage and Analysis, SC ’09. ACM, New York, NY, USA (2009)

  11. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In: 2009 International Conference on Parallel Processing, pp. 124–131 (2009)

  12. Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7–8), 604–615 (2006)

    Article  Google Scholar 

  13. González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In: Proc. 12th IEEE Intl. Conf. on High Performance Computing and Communications, (HPCC 2010), pp. 79–88. IEEE (2010)

  14. González, C.H., Fraguela, B.B.: A general and efficient divide-and-conquer algorithm framework for multi-core clusters. Cluster Comput. 20(3), 2605–2626 (2017)

    Article  Google Scholar 

  15. Gorlatch, S., Cole, M.: Parallel skeletons. In: Encyclopedia of Parallel Computing, pp. 1417–1422. Springer (2011)

  16. Halstead, M.H.: Elements of Software Science. Elsevier, Amsterdam (1977)

    MATH  Google Scholar 

  17. Hosseini Rad, M., Patooghy, A., Fazeli, M.: An efficient programming skeleton for clusters of multi-core processors. Int. J. Parallel Program. 46, 1094–1109 (2018)

    Article  Google Scholar 

  18. von Koch, T.J.K.E., Manilov, S., Vasiladiotis, C., Cole, M., Franke, B.: Towards a compiler analysis for parallel algorithmic skeletons. In: Proc. 27th Intl. Conf. on Compiler Construction, CC 2018, pp. 174–184 (2018)

  19. Kozsik, T., Tóth, M., Bozó, I.D.: Free the conqueror! refactoring divide-and-conquer functions. Future Gener. Comput. Syst. 79(P2), 687–699 (2018)

    Article  Google Scholar 

  20. Leyton, M., Piquer, J.M.: Skandium: multi-core programming with algorithmic skeletons. In: Proc. 18th Euromicro Conf. on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289–296. IEEE (2010)

  21. Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. In: Proc. 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIII, pp. 287-296. Association for Computing Machinery, New York, NY, USA (2008)

  22. Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston (2004)

    MATH  Google Scholar 

  23. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2, 308–320 (1976)

    Article  MathSciNet  Google Scholar 

  24. Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.W.: UTS: An unbalanced tree search benchmark. In: Languages and Compilers for Parallel Computing (LCPC 2006), pp. 235–250. Springer Berlin Heidelberg (2006)

  25. OpenMP Architecture Review Board: OpenMP application program interface version 5.0 (2018)

  26. Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol (2007)

    Google Scholar 

  27. Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In: Proc. 3rd ACM Symp. on Parallel Algorithms and Architectures, (SPAA’91), pp. 237–245. ACM (1991)

  28. Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In: Proc. Third Conf. on Partitioned Global Address Space Programing Models, PGAS ’09, pp. 10:1–10:7. ACM (2009)

  29. Thoman, P., Jordan, H., Fahringer, T.: Adaptive granularity control in task parallel programs using multiversioning. In: Euro-Par 2013 Parallel Processing, pp. 164–177. Springer, Berlin, Heidelberg (2013)

  30. van Dijk, T., van de Pol, J.C.: Lace: Non-blocking split deque for work-stealing. In: Euro-Par 2014: Parallel Processing Workshops, pp. 206–217. Springer (2014)

  31. White, J.L.: Reverse search for enumeration—applications. http://cgm.cs.mcgill.ca/~avis/doc/rs/applications/index.html. Accessed 6 Mar 2021 (2008)

  32. Yang, J., He, Q.: Scheduling parallel computations by work stealing: a survey. Int. J. Parallel Program. 46(2), 173–197 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the Ministry of Science and Innovation of Spain (TIN2016-75845-P, PID2019-104184RB-I00 and PID2019-104834GB-I00, AEI/FEDER/EU, 10.13039/501100011033) and the predoctoral Grant of Millán Álvarez Ref. BES-2017-081320), and by the Xunta de Galicia co-founded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2017/04 and ED431C 2018/19). We acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC” and the Centro Singular de Investigación en Tecnoloxías Intelixentes “CiTIUS”, funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program), by grants ED431G 2019/01 and ED431G 2019/04. We also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of their computers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Millán A. Martínez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martínez, M.A., Fraguela, B.B. & Cabaleiro, J.C. A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems. Int J Parallel Prog 49, 820–845 (2021). https://doi.org/10.1007/s10766-021-00709-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-021-00709-y

Keywords

Navigation