Towards Unifying OpenMP Under the Task-Parallel Paradigm

Podobas, Artur; Karlsson, Sven

doi:10.1007/978-3-319-45550-1_9

Artur Podobas¹⁶ &
Sven Karlsson¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1199 Accesses
7 Citations

Abstract

OpenMP 4.5 introduced a task-parallel version of the classical thread-parallel for-loop construct: the taskloop construct. With this new construct, programmers are given the opportunity to choose between the two parallel paradigms to parallelize their for loops. However, it is unclear where and when the two approaches should be used when writing efficient parallel applications.

In this paper, we explore the taskloop construct. We study performance differences between traditional thread-parallel for loops and the new taskloop directive. We introduce an efficient implementation and compare our implementation to other taskloop implementations using micro- and kernel-benchmarks, as well as an application. We show that our taskloop implementation on average results in a 3.2 % increase in peak performance when compared against corresponding parallel-for loops.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Proposal for Task-Generating Loops in OpenMP*

A Quantitative Analysis of OpenMP Task Runtime Systems

Suitability of Performance Tools for OpenMP Task-Parallel Programs

Notes

1.
The Błysk prototype implementation can be obtained through https://github.com/podobas/BLYSK.git.

References

Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. In: Proceedings of the Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 1–12. ACM (2000)
Google Scholar
Aslot, V., Domeika, M., Eigenmann, R., Gaertner, G., Jones, W.B., Parady, B.: SPEComp: a new benchmark suite for measuring parallel computer performance. In: Eigenmann, R., Voss, M.J. (eds.) WOMPAT 2001. LNCS, vol. 2104, pp. 1–10. Springer, Heidelberg (2001)
Chapter Google Scholar
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)
Article Google Scholar
Bienia, C., Li, K.: PARSEC 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the Annual Workshop on Modeling, Benchmarking and Simulation, vol. 2011 (2009)
Google Scholar
Bohme, D., Wolf, F., Supinski, D., Bronis, R., Schulz, M., Geimer, M.: Scalable critical-path based performance analysis. In: Proceedings of Parallel & Distributed Processing Symposium, pp. 1330–1340. IEEE (2012)
Google Scholar
Bonnichsen, L., Podobas, A.: Using transactional memory to avoid blocking in OpenMP synchronization directives. In: Terboven, C., et al. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 149–161. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24595-9_11
Chapter Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of IEEE International Symposium on Workload Characterization, pp. 44–54. IEEE (2009)
Google Scholar
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)
Article MathSciNet Google Scholar
Goldstein, S.C., Schauser, K.E., Culler, D.E.: Lazy threads: implementing a fast parallel call. J. Parallel Distrib. Comput. 37(1), 5–20 (1996)
Article Google Scholar
González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In: Proceedings of IEEE International Conference on High Performance Computing and Communications, pp. 79–88. IEEE (2010)
Google Scholar
Kumar, P.: Cache oblivious algorithms. In: Petreschi, R., Persiano, G., Silvestri, R. (eds.) CIAC 2003. LNCS, vol. 2653, pp. 193–212. Springer, Heidelberg (2003)
Chapter Google Scholar
Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010)
Article Google Scholar
Mohr, E., Kranz, D.A., Halstead Jr., R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)
Article Google Scholar
Podobas, A., Brorsson, M., Vlassov, V.: TurboBŁYSK: scheduling for improved data-driven task performance with fast dependency resolution. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 45–57. Springer, Heidelberg (2014)
Google Scholar
Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. 100(12), 1425–1439 (1987)
Article Google Scholar
Tzen, H.T., Ni, L.M.: Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans. Parallel Distrib. Syst. 4(1), 87–98 (1993)
Article Google Scholar
Zhang, Y., Burcea, M., Cheng, V., Ho, R., Voss, M.: An adaptive OpenMP loop scheduler for hyperthreaded SMPs. In: Proceedings of International Conference on Parallel and Distributed Computing (and Communications) Systems, pp. 256–263 (2004)
Google Scholar
Zhang, Y., Voss, M., Rogers, E.S.: Runtime empirical selection of loop schedulers on hyperthreaded smps. In: Proceedings of International Parallel and Distributed Processing Symposium, p. 44b. IEEE (2005)
Google Scholar

Download references

Acknowledgments

We acknowledge the reviewers for their suggestions in making this paper better. The research leading to these results has received funding from the ARTEMIS Joint Undertaking under grant agreement number 332913 for project COPCAMS.

Author information

Authors and Affiliations

Technical University of Denmark, Kongens Lyngby, Denmark
Artur Podobas & Sven Karlsson

Authors

Artur Podobas
View author publications
You can also search for this author in PubMed Google Scholar
Sven Karlsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artur Podobas .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Podobas, A., Karlsson, S. (2016). Towards Unifying OpenMP Under the Task-Parallel Paradigm. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_9
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Unifying OpenMP Under the Task-Parallel Paradigm

Abstract

Access this chapter

Similar content being viewed by others

A Proposal for Task-Generating Loops in OpenMP*

A Quantitative Analysis of OpenMP Task Runtime Systems

Suitability of Performance Tools for OpenMP Task-Parallel Programs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Unifying OpenMP Under the Task-Parallel Paradigm

Abstract

Access this chapter

Similar content being viewed by others

A Proposal for Task-Generating Loops in OpenMP*

A Quantitative Analysis of OpenMP Task Runtime Systems

Suitability of Performance Tools for OpenMP Task-Parallel Programs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation