research-article

STELLA: a domain-specific tool for structured grid methods in weather and climate models

Authors:
Tobias Gysi

ETH Zurich, Supercomputing Systems AG

ETH Zurich, Supercomputing Systems AG
View Profile

,
Carlos Osuna

Center for Climate Systems, Modeling, ETH Zurich

Center for Climate Systems, Modeling, ETH Zurich
View Profile

,
Oliver Fuhrer

Federal Office of Meteorology and Climatology MeteoSwiss

Federal Office of Meteorology and Climatology MeteoSwiss
View Profile

,
Mauro Bianco

CSCS, ETH Zurich

CSCS, ETH Zurich
View Profile

,
Thomas C. Schulthess

Oak Ridge National Laboratory

Oak Ridge National Laboratory
View Profile

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2015Article No.: 41Pages 1–12https://doi.org/10.1145/2807591.2807627

Published:15 November 2015Publication History

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–12

ABSTRACT

Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often hardware dependent and lead to a significant increase in code complexity. We present a domain-specific tool, STELLA, which eases the burden of the application developer by separating the architecture dependent implementation strategy from the user-code and is targeted at multi- and manycore processors. On the example of a numerical weather prediction and regional climate model (COSMO) we demonstrate the usefulness of STELLA for a real-world production code. The dynamical core based on STELLA achieves a speedup factor of 1.8x (CPU) and 5.8x (GPU) with respect to the legacy code while reducing the complexity of the user code.

References

I. Abrahams and A. Gurtovoy. C++ Template Metaprogramming: Concepts, Tools, And Techniques From Boost And Beyond. The C++ in-Depth Series. Addison Wesley Professional, 2005. Google ScholarDigital Library
A. Alexandrescu. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001. Google ScholarDigital Library
M. Baldauf. Linear stability analysis of runge--kutta-based partial time-splitting schemes for the euler equations. Monthly Weather Review, 138(4475-4496), 2010.Google Scholar
M. Baldauf, A. Seifert, J. Förstner, D. Majewski, and M. Raschendorfer. Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities. Monthly Weather Review, 139:3387--3905, 2011.Google ScholarCross Ref
M. Bianco. An interface for halo exchange pattern, 2012.Google Scholar
M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 676--687, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
Consortium for Small-Scale Modeling. http://www.cosmo-model.org/.Google Scholar
Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 9:1--9:12, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
G. Doms and U. Schättler. The nonhydrostatic limited-area model LM (Lokal-Modell) of the DWD. Part I: Scientific documentation. Technical report, German Weather Service (DWD), Offenbach, Germany, 1999.Google Scholar
T. M. Forum. MPI: A message passing interface, 1993.Google Scholar
O. Fuhrer, C. Osuna, X. Lapillonne, T. Gysi, B. Cumming, M. Bianco, A. Arteaga, and T. Schulthess. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing frontiers and innovations, 1(1), 2014.Google Scholar
T. Gysi, T. Grosser, and T. Hoefler. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pages 177--186, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
Khronos Group. OpenCL (Open Computing Language). https://www.khronos.org/opencl/.Google Scholar
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 235--244, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
X. Lapillonne and O. Fuhrer. Using compiler directives to port large scientific applications to GPUs: An example from atmospheric science. Parallel Processing Letters, 24(1):1450003, 2014.Google ScholarCross Ref
N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11. ACM, 2011. Google ScholarDigital Library
S. Mehta, P.-H. Lin, and P.-C. Yew. Revisiting loop fusion in the polyhedral framework. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 233--246, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain-specific languages. ACM Computing Surveys, 37(4):316--344, 2005. Google ScholarDigital Library
P. Micikevicius. GPU performance analysis and optimization, 2012.Google Scholar
NVIDIA. CUDA Parallel Computing Platform. https://developer.nvidia.com/cuda.Google Scholar
OpenACC Corporation. The OpenACC Application Programing Interface, 2011. http://www.openacc.org/.Google Scholar
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
J. Steppeler, G. Doms, U. Schättler, H. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric. Meso gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 2002.Google Scholar
Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The pochoir stencil compiler. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 117--128, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
The OpenMP ARB. The OpenMP API Specification for Parallel Programming, 2013. http://www.openmp.org.Google Scholar
R. Torres, L. Linardakis, J. Kunkel, and T. Ludwig. ICON DSL: A domain-specific language for climate modeling.Google Scholar
R. A. van Engelen. ATMOL: A domain-sepcific language for atmospheric modeling. Journal of Computing and Information Technology, 4(289-303), 2002.Google Scholar
R. A. van Engelen, L. Wolters, and G. Cats. Ctadel: a generator of multi-platform high-performance codes for PDE-based scientific applications. In Proceedings of the 10th international conference on Supercomputing, pages 86--93, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
M. Wahib and N. Maruyama. Scalable kernel fusion for memory-bound gpu applications. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 191--202, Nov 2014. Google ScholarDigital Library
T. Weusthoff, F. Ament, M. Arpagaus, and M. W. Rotach. Assessing the benefits of convection-permitting models by neighborhood verification: Examples from map d-phase. Monthly Weather Review, 138:3418--3433, 2010.Google ScholarCross Ref
L. J. Wicker and W. C. Skamarock. Time-splitting methods for elastic models using forward time schemes. Monthly Weather Review, 130:2088--2097, 2001.Google ScholarCross Ref
M. Xue. High-order monotonic numerical diffusion and smoothing. Monthly Weather Review, 128(8):2853--2864, 1999.Google ScholarCross Ref

Index Terms

STELLA: a domain-specific tool for structured grid methods in weather and climate models
1. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Domain specific languages
  2. Software organization and properties
    1. Software system structures
      1. Software architectures

Recommendations

Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing Systems

With fast development of GPU hardware and software, using GPUs to accelerate non-graphics CPU applications is becoming inevitable trend. GPUs are good at performing ALU-intensive computation and feature high peak performance; however, how to harness ...
Read More
Programming the Adapteva Epiphany 64-core network-on-chip coprocessor

Energy efficiency is the primary impediment in the path to exascale computing. Consequently, the high-performance computing community is increasingly interested in low-power high-performance embedded systems as building blocks for large-scale high-...
Read More
Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
IWOCL '19: Proceedings of the International Workshop on OpenCL

A key challenge in programming high-performance applications is achieving portable performance, such that the same program code can reach a consistent level of performance over the variety of modern parallel processors, including multi-core CPU and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
General Chair:
Jackie Kern
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Jeffrey S. Vetter
Oak Ridge National Laboratory and Georgia Institute of Technology, Oak Ridge, Tennessee
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
atmospheric model
domain-specific language
heterogeneous system
stencil
Qualifiers
- research-article
Conference

Acceptance Rates
SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 68
  Total Citations
  View Citations
- 580
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

STELLA: a domain-specific tool for structured grid methods in weather and climate models

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing stencil application on multi-thread GPU architecture using stream programming model

Programming the Adapteva Epiphany 64-core network-on-chip coprocessor

Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

STELLA: a domain-specific tool for structured grid methods in weather and climate models

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing stencil application on multi-thread GPU architecture using stream programming model

Programming the Adapteva Epiphany 64-core network-on-chip coprocessor

Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media