skip to main content
10.1145/2464996.2465007acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

An automatic input-sensitive approach for heterogeneous task partitioning

Published:10 June 2013Publication History

ABSTRACT

Unleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources.

In this paper we propose a novel approach that automatically optimizes task partitioning for different (input) problem sizes and different heterogeneous multi-core architectures. We use the Insieme source-to-source compiler to translate a single-device OpenCL program into a multi-device OpenCL program. The Insieme Runtime System then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach based on Artificial Neural Networks (ANN) that incorporates static program features as well as dynamic, input sensitive features. Principal component analysis have been used to further improve the task partitioning. Our approach has been evaluated over a suite of 23 programs and respectively achieves a performance improvement of 22% and 25% compared to an execution of the benchmarks on a single CPU and a single GPU which is equal to 87.5% of the optimal performance.

References

  1. Insieme Compiler Runtime Framework. http://insieme-compiler.org/.Google ScholarGoogle Scholar
  2. HMPP, Hybrid Multicore Parallel Programming. http://www.openhmpp.org, 2012.Google ScholarGoogle Scholar
  3. OpenACC Application Program Interface. http://openacc.org/, 2012.Google ScholarGoogle Scholar
  4. R-manual:Student's t-Test. http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html, 2012.Google ScholarGoogle Scholar
  5. Top 500 Supercomputer site. http://www.top500.org/, 2012.Google ScholarGoogle Scholar
  6. R. Aoki, S. Oikawa, T. Nakamura, and S. Miki. Hybrid opencl: Enhancing opencl for distributed processing. In ISPA, pages 149--154, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Apple Inc. Clang/LLVM. http://clang.llvm.org/, 2012.Google ScholarGoogle Scholar
  8. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. Starpu: A unified platform for task scheduling on heterogeneous multicore architectures. In Euro-Par,pages 863--874, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Barak and A. Shilo. The Virtual OpenCL (VCL) Cluster Platform. In Proc. Intel European Research & Innovation Conference, page 196, 2011.Google ScholarGoogle Scholar
  10. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Chen, W. Chen, and W. Zheng. Cuda-zero: a framework for porting shared memory gpu applications to multi-gpus. SCIENCE CHINA Information Sciences, 55(3):663--676, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  12. Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. In GPGPU, pages 63--74, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ethem, Alpaydin. Introduction to Machine Learning. The MIT Press, Cambridge, MA, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Grasso, S. Pellegrini, B. Cosenza, and T. Fahringer. libwater: Heterogeneous distributed computing made easy. In ICS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. Auto-tuning a high-level language targeted to gpu codes. In InPar, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Gregg and K. M. Hazelwood. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In ISPASS, pages 134--144, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Grewe and M. F. O'Boyle. A static task partitioning approach for heterogeneous systems using opencl. In CC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Hong, D. Chen, W. Chen, W. Zheng, and H. Lin. Mapcg: writing parallel program portable between cpu and gpu. In PACT, pages 217--226, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Institut für Neuroinformatik, Ruhr-University Bochum. Shark Machine Learning Library. http://shark-project.sourceforge.net/, 2012.Google ScholarGoogle Scholar
  21. M. Kai, L. Xue, C. Wei, Z. Chi, and W. Xiaorui. Greengpu: A holistic approach to energy effciency in gpu-cpu heterogeneous architectures. In ICPP, 2012.Google ScholarGoogle Scholar
  22. Khronos OpenCL Working Group. The OpenCL 1.2 specification. http://www.khronos.org/opencl, 2012.Google ScholarGoogle Scholar
  23. J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Opencl as a unified programming model for heterogeneous cpu/gpu clusters. In PPoPP, pages 299--300, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Snucl: an opencl framework for heterogeneous cpu/gpu clusters. In ICS, pages 341--352, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. D. Linderman, J. D. Collins, H. Wang, and T. H. Y. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS, pages 287--296, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO, pages 45--55, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. NVIDIA Corporation. CUDA Programming Model. https://developer.nvidia.com/cuda-toolkit, 2012.Google ScholarGoogle Scholar
  28. A. Panagiotidis, D. Kauker, F. Sadlo, and T. Ertl. Distributed computation and large-scale visualization in heterogeneous compute environments. In Proceedings of the 11th International Symposium on Parallel and Distributed Computing, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Pearson. On lines and planes of closest fit to a system of points in space. In Philosophical Magazine, Series 6, vol. 2, no. 11, pages 557--572, 1901.Google ScholarGoogle Scholar
  30. M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of the international symposium on Code generation and optimization, CGO '05, pages 123--134, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Strengert, C. Muller, C. Dachsbacher, and T. Ertl. Cudasa: Compute unified device and systems architecture. In EGPGV, pages 49--56, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, pages 84--93, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Thoman, K. Kofler, H. Studt, J. Thomson, and T. Fahringer. Automatic opencl device characterization: guiding optimized kernel design. In Euro-Par, pages 438--452, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An automatic input-sensitive approach for heterogeneous task partitioning

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
              June 2013
              512 pages
              ISBN:9781450321303
              DOI:10.1145/2464996

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 10 June 2013

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              ICS '13 Paper Acceptance Rate43of202submissions,21%Overall Acceptance Rate629of2,180submissions,29%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader