skip to main content
10.1145/2807591.2807627acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

STELLA: a domain-specific tool for structured grid methods in weather and climate models

Published:15 November 2015Publication History

ABSTRACT

Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often hardware dependent and lead to a significant increase in code complexity. We present a domain-specific tool, STELLA, which eases the burden of the application developer by separating the architecture dependent implementation strategy from the user-code and is targeted at multi- and manycore processors. On the example of a numerical weather prediction and regional climate model (COSMO) we demonstrate the usefulness of STELLA for a real-world production code. The dynamical core based on STELLA achieves a speedup factor of 1.8x (CPU) and 5.8x (GPU) with respect to the legacy code while reducing the complexity of the user code.

References

  1. I. Abrahams and A. Gurtovoy. C++ Template Metaprogramming: Concepts, Tools, And Techniques From Boost And Beyond. The C++ in-Depth Series. Addison Wesley Professional, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Alexandrescu. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Baldauf. Linear stability analysis of runge--kutta-based partial time-splitting schemes for the euler equations. Monthly Weather Review, 138(4475-4496), 2010.Google ScholarGoogle Scholar
  4. M. Baldauf, A. Seifert, J. Förstner, D. Majewski, and M. Raschendorfer. Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities. Monthly Weather Review, 139:3387--3905, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Bianco. An interface for halo exchange pattern, 2012.Google ScholarGoogle Scholar
  6. M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 676--687, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Consortium for Small-Scale Modeling. http://www.cosmo-model.org/.Google ScholarGoogle Scholar
  8. Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 9:1--9:12, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Doms and U. Schättler. The nonhydrostatic limited-area model LM (Lokal-Modell) of the DWD. Part I: Scientific documentation. Technical report, German Weather Service (DWD), Offenbach, Germany, 1999.Google ScholarGoogle Scholar
  10. T. M. Forum. MPI: A message passing interface, 1993.Google ScholarGoogle Scholar
  11. O. Fuhrer, C. Osuna, X. Lapillonne, T. Gysi, B. Cumming, M. Bianco, A. Arteaga, and T. Schulthess. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing frontiers and innovations, 1(1), 2014.Google ScholarGoogle Scholar
  12. T. Gysi, T. Grosser, and T. Hoefler. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pages 177--186, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Khronos Group. OpenCL (Open Computing Language). https://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  15. S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 235--244, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Lapillonne and O. Fuhrer. Using compiler directives to port large scientific applications to GPUs: An example from atmospheric science. Parallel Processing Letters, 24(1):1450003, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Mehta, P.-H. Lin, and P.-C. Yew. Revisiting loop fusion in the polyhedral framework. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 233--246, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain-specific languages. ACM Computing Surveys, 37(4):316--344, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Micikevicius. GPU performance analysis and optimization, 2012.Google ScholarGoogle Scholar
  21. NVIDIA. CUDA Parallel Computing Platform. https://developer.nvidia.com/cuda.Google ScholarGoogle Scholar
  22. OpenACC Corporation. The OpenACC Application Programing Interface, 2011. http://www.openacc.org/.Google ScholarGoogle Scholar
  23. J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Steppeler, G. Doms, U. Schättler, H. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric. Meso gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 2002.Google ScholarGoogle Scholar
  25. Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The pochoir stencil compiler. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 117--128, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. The OpenMP ARB. The OpenMP API Specification for Parallel Programming, 2013. http://www.openmp.org.Google ScholarGoogle Scholar
  27. R. Torres, L. Linardakis, J. Kunkel, and T. Ludwig. ICON DSL: A domain-specific language for climate modeling.Google ScholarGoogle Scholar
  28. R. A. van Engelen. ATMOL: A domain-sepcific language for atmospheric modeling. Journal of Computing and Information Technology, 4(289-303), 2002.Google ScholarGoogle Scholar
  29. R. A. van Engelen, L. Wolters, and G. Cats. Ctadel: a generator of multi-platform high-performance codes for PDE-based scientific applications. In Proceedings of the 10th international conference on Supercomputing, pages 86--93, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Wahib and N. Maruyama. Scalable kernel fusion for memory-bound gpu applications. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 191--202, Nov 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Weusthoff, F. Ament, M. Arpagaus, and M. W. Rotach. Assessing the benefits of convection-permitting models by neighborhood verification: Examples from map d-phase. Monthly Weather Review, 138:3418--3433, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  32. L. J. Wicker and W. C. Skamarock. Time-splitting methods for elastic models using forward time schemes. Monthly Weather Review, 130:2088--2097, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. Xue. High-order monotonic numerical diffusion and smoothing. Monthly Weather Review, 128(8):2853--2864, 1999.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. STELLA: a domain-specific tool for structured grid methods in weather and climate models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
        November 2015
        985 pages
        ISBN:9781450337236
        DOI:10.1145/2807591
        • General Chair:
        • Jackie Kern,
        • Program Chair:
        • Jeffrey S. Vetter

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 November 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader