research-article

An automatic input-sensitive approach for heterogeneous task partitioning

Authors:
Klaus Kofler

University of Innsbruck, Innsbruck, Austria

University of Innsbruck, Innsbruck, Austria
View Profile

,
Ivan Grasso

University of Innsbruck, Innsbruck, Austria

University of Innsbruck, Innsbruck, Austria
View Profile

,
Biagio Cosenza

University of Innsbruck, Innsbruck, Austria

University of Innsbruck, Innsbruck, Austria
View Profile

,
Thomas Fahringer

University of Innsbruck, Innsbruck, Austria

University of Innsbruck, Innsbruck, Austria
View Profile

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingJune 2013Pages 149–160https://doi.org/10.1145/2464996.2465007

Published:10 June 2013Publication History

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

Pages 149–160

ABSTRACT

Unleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources.

In this paper we propose a novel approach that automatically optimizes task partitioning for different (input) problem sizes and different heterogeneous multi-core architectures. We use the Insieme source-to-source compiler to translate a single-device OpenCL program into a multi-device OpenCL program. The Insieme Runtime System then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach based on Artificial Neural Networks (ANN) that incorporates static program features as well as dynamic, input sensitive features. Principal component analysis have been used to further improve the task partitioning. Our approach has been evaluated over a suite of 23 programs and respectively achieves a performance improvement of 22% and 25% compared to an execution of the benchmarks on a single CPU and a single GPU which is equal to 87.5% of the optimal performance.

References

Insieme Compiler Runtime Framework. http://insieme-compiler.org/.Google Scholar
HMPP, Hybrid Multicore Parallel Programming. http://www.openhmpp.org, 2012.Google Scholar
OpenACC Application Program Interface. http://openacc.org/, 2012.Google Scholar
R-manual:Student's t-Test. http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html, 2012.Google Scholar
Top 500 Supercomputer site. http://www.top500.org/, 2012.Google Scholar
R. Aoki, S. Oikawa, T. Nakamura, and S. Miki. Hybrid opencl: Enhancing opencl for distributed processing. In ISPA, pages 149--154, 2011. Google ScholarDigital Library
Apple Inc. Clang/LLVM. http://clang.llvm.org/, 2012.Google Scholar
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. Starpu: A unified platform for task scheduling on heterogeneous multicore architectures. In Euro-Par,pages 863--874, 2009. Google ScholarDigital Library
A. Barak and A. Shilo. The Virtual OpenCL (VCL) Cluster Platform. In Proc. Intel European Research & Innovation Conference, page 196, 2011.Google Scholar
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009. Google ScholarDigital Library
D. Chen, W. Chen, and W. Zheng. Cuda-zero: a framework for porting shared memory gpu applications to multi-gpus. SCIENCE CHINA Information Sciences, 55(3):663--676, 2012.Google ScholarCross Ref
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google ScholarDigital Library
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. In GPGPU, pages 63--74, 2010. Google ScholarDigital Library
Ethem, Alpaydin. Introduction to Machine Learning. The MIT Press, Cambridge, MA, USA, 2004. Google ScholarDigital Library
I. Grasso, S. Pellegrini, B. Cosenza, and T. Fahringer. libwater: Heterogeneous distributed computing made easy. In ICS, 2013. Google ScholarDigital Library
S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. Auto-tuning a high-level language targeted to gpu codes. In InPar, 2012.Google ScholarCross Ref
C. Gregg and K. M. Hazelwood. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In ISPASS, pages 134--144, 2011. Google ScholarDigital Library
D. Grewe and M. F. O'Boyle. A static task partitioning approach for heterogeneous systems using opencl. In CC, 2011. Google ScholarDigital Library
C. Hong, D. Chen, W. Chen, W. Zheng, and H. Lin. Mapcg: writing parallel program portable between cpu and gpu. In PACT, pages 217--226, 2010. Google ScholarDigital Library
Institut für Neuroinformatik, Ruhr-University Bochum. Shark Machine Learning Library. http://shark-project.sourceforge.net/, 2012.Google Scholar
M. Kai, L. Xue, C. Wei, Z. Chi, and W. Xiaorui. Greengpu: A holistic approach to energy effciency in gpu-cpu heterogeneous architectures. In ICPP, 2012.Google Scholar
Khronos OpenCL Working Group. The OpenCL 1.2 specification. http://www.khronos.org/opencl, 2012.Google Scholar
J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Opencl as a unified programming model for heterogeneous cpu/gpu clusters. In PPoPP, pages 299--300, 2012. Google ScholarDigital Library
J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Snucl: an opencl framework for heterogeneous cpu/gpu clusters. In ICS, pages 341--352, 2012. Google ScholarDigital Library
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Y. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS, pages 287--296, 2008. Google ScholarDigital Library
C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO, pages 45--55, 2009. Google ScholarDigital Library
NVIDIA Corporation. CUDA Programming Model. https://developer.nvidia.com/cuda-toolkit, 2012.Google Scholar
A. Panagiotidis, D. Kauker, F. Sadlo, and T. Ertl. Distributed computation and large-scale visualization in heterogeneous compute environments. In Proceedings of the 11th International Symposium on Parallel and Distributed Computing, 2012. Google ScholarDigital Library
K. Pearson. On lines and planes of closest fit to a system of points in space. In Philosophical Magazine, Series 6, vol. 2, no. 11, pages 557--572, 1901.Google Scholar
M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of the international symposium on Code generation and optimization, CGO '05, pages 123--134, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
M. Strengert, C. Muller, C. Dachsbacher, and T. Ertl. Cudasa: Compute unified device and systems architecture. In EGPGV, pages 49--56, 2008. Google ScholarDigital Library
E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, pages 84--93, 2012. Google ScholarDigital Library
P. Thoman, K. Kofler, H. Studt, J. Thomson, and T. Fahringer. Automatic opencl device characterization: guiding optimized kernel design. In Euro-Par, pages 438--452, 2011. Google ScholarDigital Library

Index Terms

An automatic input-sensitive approach for heterogeneous task partitioning
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

Automatic problem size sensitive task partitioning on heterogeneous parallel systems
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

In this paper we propose a novel approach which automatizes task partitioning in heterogeneous systems. Our framework is based on the Insieme Compiler and Runtime infrastructure. The compiler translates a single-device OpenCL program into a multi-device ...
Read More
Automatic problem size sensitive task partitioning on heterogeneous parallel systems
PPoPP '13

In this paper we propose a novel approach which automatizes task partitioning in heterogeneous systems. Our framework is based on the Insieme Compiler and Runtime infrastructure. The compiler translates a single-device OpenCL program into a multi-device ...
Read More
Resource aggregation for task-based Cholesky Factorization on top of modern architectures
Abstract
Hybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
June 2013
512 pages
ISBN:9781450321303
DOI:10.1145/2464996
General Chair:
Allen D. Malony
University of Oregon, USA
,
Program Chairs:
Mario Nemirovsky
Barcelona Supercomputing Center, Spain
,
Sam Midkiff
Purdue University, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
code analysis
compilers
gpu
heterogeneous computing
machine learning
runtime system
task partitioning
Qualifiers
- research-article
Conference

Acceptance Rates
ICS '13 Paper Acceptance Rate43of202submissions,21%Overall Acceptance Rate629of2,180submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 65
  Total Citations
  View Citations
- 511
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An automatic input-sensitive approach for heterogeneous task partitioning

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic problem size sensitive task partitioning on heterogeneous parallel systems

Automatic problem size sensitive task partitioning on heterogeneous parallel systems

Resource aggregation for task-based Cholesky Factorization on top of modern architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An automatic input-sensitive approach for heterogeneous task partitioning

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic problem size sensitive task partitioning on heterogeneous parallel systems

Automatic problem size sensitive task partitioning on heterogeneous parallel systems

Resource aggregation for task-based Cholesky Factorization on top of modern architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media