ABSTRACT
Recursive parallel programming models such as Cilk strive to simplify the task of parallel programming by enabling a simple divide-and-conquer programming model. This model is effective in recursively partitioning work into smaller parts and combining their results. However, recursive work partitioning can impose additional constraints on concurrency than is implied by the true dependencies in a program. In this paper, we present a speculation-based approach to alleviate the concurrency constraints imposed by such recursive parallel programs. We design a runtime infrastructure that supports speculative execution and a predictor to accurately learn and identify opportunities to relax extraneous concurrency constraints. Experimental evaluation demonstrates that speculative relaxation of concurrency constraints can deliver gains of up to 1.6x on 30 cores over baseline Cilk.
- K. Agrawal, C. E. Leiserson, and J. Sukha. Executing task graphs using work-stealing. In IPDPS, pages 1--12, 2010.Google ScholarCross Ref
- E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. TPDS, 20(3):404--418, 2009. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, 1998. Google ScholarDigital Library
- L. Gao, L. Li, J. Xue, and T.-F. Ngai. Exploiting speculative tlp in recursive programs by dynamic thread prediction. In Compiler Construction, pages 78--93, 2009. Google ScholarDigital Library
- Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In IPDPS, pages 1--12, 2009. Google ScholarDigital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI, pages 211--222, 2007. Google ScholarDigital Library
- D. Lea et al. Java specification request 166: Concurrency utilities, 2004.Google Scholar
- J. Lifflander, S. Krishnamoorthy, and L. V. Kalé. Optimizing data locality for fork/join programs using constrained work stealing. In SC, pages 857--868, 2014. Google ScholarDigital Library
- J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly Media, 2007.Google ScholarDigital Library
- V. A. Saraswat, V. Sarkar, and C. von Praun. X10: concurrent programming for modern architectures. In PPoPP, pages 271--271, 2007. Google ScholarDigital Library
- Y. Tang, R. You, H. Kan, J. J. Tithi, P. Ganapathi, and R. A. Chowdhury. Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency. In PPoPP, pages 205--214, 2015. Google ScholarDigital Library
- S. Tasirlar and V. Sarkar. Data-driven tasks and their implementation. In ICPP, pages 652--661, 2011. Google ScholarDigital Library
- J. Valdes, R. E. Tarjan, and E. L. Lawler. The recognition of series parallel digraphs. In STOC, pages 1--12, 1979. Google ScholarDigital Library
Index Terms
- CilkSpec: optimistic concurrency for Cilk
Recommendations
On-the-Fly Pipeline Parallelism
Special Issue for SPAA 2013Pipeline parallelism organizes a parallel program as a linear sequence of stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have ...
Brief announcement: a lower bound for depth-restricted work stealing
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architecturesWork stealing is a common technique used in the runtime schedulers of parallel languages such as Cilk and parallel libraries such as Intel Threading Building Blocks (TBB). Depth-restricted work stealing is a restriction of Cilk-like work stealing in ...
Comments