ABSTRACT
The increasing performance of today's computer architecture comes with an unprecedented augment of hardware complexity. Unfortunately this results in difficult-to-tune software and consequentially in a gap between the potential peak performance and the actual performance. Automatic tuning is an emerging approach that assists the programmer in managing this complexity. State-of-the-art autotuners are limited, though: they either require long tuning times, e.g., due to iterative searches, or cannot tackle the complexity of the problem due to the limitation of the supervised machine learning (ML) methodologies used. In particular, traditional ML autotuning approaches exploiting classification algorithms (such as neural networks and support vector machines) face difficulties in capturing all features of large search spaces. We propose a new way of performing automatic tuning based on structural learning: the tuning problem is formulated as a version ranking prediction modeling and solved using ordinal regression. We demonstrate its potential on a well-known autotuning problem: stencil computations. We compare state-of-the-art iterative compilation methods with our ordinal regression approach and analyze the quality of the obtained ranking in terms of Kendall rank correlation coefficients.
- Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In Int. Conference on Parallel Architectures and Compilation Techniques (PACT). Google ScholarDigital Library
- Gükhan H. Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar, and S. V. N. Vishwanathan. 2007. Predicting Structured Data (Neural Information Processing). The MIT Press. Google ScholarDigital Library
- Cosenza Biagio, Juan J. Durillo, Stefano Ermon, and Ben Juurlink. 2017. Autotuning Stencil Computations with Structural Ordinal Regression Learning. In IEEE International Parallel and Distributed Processing Symposium (IPDPS).Google Scholar
- Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures. In IEEE International Parallel & Distributed Processing Symposium (IPDPS). 676--687. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). 133--142. Google ScholarDigital Library
- Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining (KDD). 217--226. Google ScholarDigital Library
- Maurice Kendall. 1976. Rank Correlation Methods (4 ed.). Hodder Arnold.Google Scholar
- Klaus Kofler, Ivan Grasso, Biagio Cosenza, and Thomas Fahringer. 2013. An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning. In ACM International Conference on Super computing (ICS). 149--160. Google ScholarDigital Library
- Hugh Leather, Edwin Bonilla, and Michael O'Boyle. 2009. Automatic Feature Generation for Machine Learning Based Optimizing Compilation. In Int. Symp. on Code Generation and Optimization (CGO). 81--91. Google ScholarDigital Library
- Yulong Luo, Guangming Tan, Zeyao Mo, and Ninghui Sun. 2015. FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model. In ACM Int. Conference on Supercomputing (ICS). 187--196. Google ScholarDigital Library
- S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro. 2014. Nitro: A Framework for Adaptive Code Variant Tuning. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 501--512. Google ScholarDigital Library
- Mark Stephenson and Saman P. Amarasinghe. 2005. Predicting Unroll Factors Using Supervised Classification. In IEEE / ACM International Symposium on Code Generation and Optimization (CGO). 123--134. Google ScholarDigital Library
- Kevin Stock, Louis-Noél Pouchet, and P. Sadayappan. 2012. Using Machine Learning to Improve Automatic Vectorization. ACM Trans. Archit. Code Optim. 8, 4, Article 50 (Jan. 2012), 23 pages. Google ScholarDigital Library
Recommendations
Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
We are witnessing the consolidation of the heterogeneous computing in parallel computing with architectures such as Cell Broadband Engine (Cell BE) or Graphics Processing Units (GPUs) which are present in a myriad of developments for high performance ...
Automatic code generation and tuning for stencil kernels on modern shared memory architectures
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for " P arallel A uto tu ned S ...
Autotuning GEMM Kernels for the Fermi GPU
In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical ...
Comments