Skip to main content

TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution

  • Conference paper
Using and Improving OpenMP for Devices, Tasks, and More (IWOMP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8766))

Included in the following conference series:

Abstract

Data-driven task-parallelism is attracting growing interest and has now been added to OpenMP (4.0). This paradigm simplifies the writing of parallel applications, extracting parallelism, and facilitates the use of distributed memory architectures. While the programming model itself is becoming mature, a problem with current run-time scheduler implementations is that they require a very large task granularity in order to scale. This limitation goes at odds with the idea of task-parallel programing where programmers should be able to concentrate on exposing parallelism with little regard to the task granularity. To mitigate this limitation, we have designed and implemented TurboBŁYSK, a highly efficient run-time scheduler of tasks with explicit data-dependence annotations. We propose a novel mechanism based on pattern-saving that allows the scheduler to re-use previously resolved dependency patterns, based on programmer annotations, enabling programs to use even the smallest of tasks and scale well. We experimentally show that our techniques in TurboBŁYSK enable achieving nearly twice the peak performance compared with other run-time schedulers. Our techniques are not OpenMP specific and can be implemented in other task-parallel frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23(2), 187–198 (2011)

    Article  Google Scholar 

  2. Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos Mercurium: A research compiler for OpenMP. In: Proceedings of the European Workshop on OpenMP, vol. 8 (2004)

    Google Scholar 

  3. Broquedis, F., Gautier, T., Danjean, V.: LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1), 3–35 (2001)

    Article  MATH  Google Scholar 

  5. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In: International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE (2009)

    Google Scholar 

  6. Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21(02), 173–193 (2011)

    Article  MathSciNet  Google Scholar 

  7. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. ACM Sigplan Notices 33(5), 212–223 (1998)

    Article  Google Scholar 

  8. Gautier, T., Lementec, F., Faucher, V., Raffin, B.: X-Kaapi: A Multi Paradigm Runtime for Multicore Architectures. Rapport de recherche RR-8058, INRIA (February 2012)

    Google Scholar 

  9. Ghosh, P., Yan, Y., Chapman, B.: Support for dependency driven executions among OpenMP tasks. In: Data-Flow Execution Models for Extreme Scale Computing, DFM 2012, pp. 48–54 (2012)

    Google Scholar 

  10. Labarta, J.: StarSS: A programming model for the multicore era. In: PRACE Workshop New Languages & Future Technology Prototypes at the Leibniz Supercomputing Centre in Garching, Germany (2010)

    Google Scholar 

  11. Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M.: Locality-Aware Task Scheduling and Data Distribution on NUMA Systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 156–170. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Nakano, H., Ishizaka, K., Obata, M., Kimura, K., Kasahara, H.: Static coarse grain task scheduling with cache optimization using OpenMP. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 479–489. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with StarSs. International Journal of High Performance Computing Applications 23(3), 284–299 (2009)

    Article  Google Scholar 

  14. Pop, A., Cohen, A.: OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO) 9(4), 53 (2013)

    Google Scholar 

  15. Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)

    Article  Google Scholar 

  16. Vandierendonck, H., Tzenakis, G., Nikolopoulos, D.S.: Analysis of dependence tracking algorithms for task dataflow execution. ACM Transactions on Architecture and Code Optimization (TACO) 10(4), 61 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Podobas, A., Brorsson, M., Vlassov, V. (2014). TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds) Using and Improving OpenMP for Devices, Tasks, and More. IWOMP 2014. Lecture Notes in Computer Science, vol 8766. Springer, Cham. https://doi.org/10.1007/978-3-319-11454-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11454-5_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11453-8

  • Online ISBN: 978-3-319-11454-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics