TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution

Podobas, Artur; Brorsson, Mats; Vlassov, Vladimir

doi:10.1007/978-3-319-11454-5_4

Artur Podobas²⁰,
Mats Brorsson²⁰ &
Vladimir Vlassov²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8766))

Included in the following conference series:

International Workshop on OpenMP

935 Accesses
7 Citations

Abstract

Data-driven task-parallelism is attracting growing interest and has now been added to OpenMP (4.0). This paradigm simplifies the writing of parallel applications, extracting parallelism, and facilitates the use of distributed memory architectures. While the programming model itself is becoming mature, a problem with current run-time scheduler implementations is that they require a very large task granularity in order to scale. This limitation goes at odds with the idea of task-parallel programing where programmers should be able to concentrate on exposing parallelism with little regard to the task granularity. To mitigate this limitation, we have designed and implemented TurboBŁYSK, a highly efficient run-time scheduler of tasks with explicit data-dependence annotations. We propose a novel mechanism based on pattern-saving that allows the scheduler to re-use previously resolved dependency patterns, based on programmer annotations, enabling programs to use even the smallest of tasks and scale well. We experimentally show that our techniques in TurboBŁYSK enable achieving nearly twice the peak performance compared with other run-time schedulers. Our techniques are not OpenMP specific and can be implemented in other task-parallel frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23(2), 187–198 (2011)
Article Google Scholar
Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos Mercurium: A research compiler for OpenMP. In: Proceedings of the European Workshop on OpenMP, vol. 8 (2004)
Google Scholar
Broquedis, F., Gautier, T., Danjean, V.: LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)
Chapter Google Scholar
Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1), 3–35 (2001)
Article MATH Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In: International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE (2009)
Google Scholar
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21(02), 173–193 (2011)
Article MathSciNet Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. ACM Sigplan Notices 33(5), 212–223 (1998)
Article Google Scholar
Gautier, T., Lementec, F., Faucher, V., Raffin, B.: X-Kaapi: A Multi Paradigm Runtime for Multicore Architectures. Rapport de recherche RR-8058, INRIA (February 2012)
Google Scholar
Ghosh, P., Yan, Y., Chapman, B.: Support for dependency driven executions among OpenMP tasks. In: Data-Flow Execution Models for Extreme Scale Computing, DFM 2012, pp. 48–54 (2012)
Google Scholar
Labarta, J.: StarSS: A programming model for the multicore era. In: PRACE Workshop New Languages & Future Technology Prototypes at the Leibniz Supercomputing Centre in Garching, Germany (2010)
Google Scholar
Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M.: Locality-Aware Task Scheduling and Data Distribution on NUMA Systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 156–170. Springer, Heidelberg (2013)
Chapter Google Scholar
Nakano, H., Ishizaka, K., Obata, M., Kimura, K., Kasahara, H.: Static coarse grain task scheduling with cache optimization using OpenMP. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 479–489. Springer, Heidelberg (2002)
Chapter Google Scholar
Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with StarSs. International Journal of High Performance Computing Applications 23(3), 284–299 (2009)
Article Google Scholar
Pop, A., Cohen, A.: OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO) 9(4), 53 (2013)
Google Scholar
Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)
Article Google Scholar
Vandierendonck, H., Tzenakis, G., Nikolopoulos, D.S.: Analysis of dependence tracking algorithms for task dataflow execution. ACM Transactions on Architecture and Code Optimization (TACO) 10(4), 61 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

KTH, Royal Institute of Technology, Stockholm, Sweden
Artur Podobas, Mats Brorsson & Vladimir Vlassov

Authors

Artur Podobas
View author publications
You can also search for this author in PubMed Google Scholar
Mats Brorsson
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Vlassov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Cray Inc., Cray Plaza, 380 Jackson St., Suite 210, 55101, St. Paul, MN, USA
Luiz DeRose
Lawrence Livermore National Laboratory, 94551-0808, Livermore, CA, USA
Bronis R. de Supinski
Sandia National Laboratories, Albuquerque, NM, USA
Stephen L. Olivier
University of Houston, Houston, TX, USA
Barbara M. Chapman
RWTH Aachen, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Podobas, A., Brorsson, M., Vlassov, V. (2014). TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds) Using and Improving OpenMP for Devices, Tasks, and More. IWOMP 2014. Lecture Notes in Computer Science, vol 8766. Springer, Cham. https://doi.org/10.1007/978-3-319-11454-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-11454-5_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11453-8
Online ISBN: 978-3-319-11454-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics