SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

Plazolles, Bastien; El Baz, Didier; Spel, Martin; Rivola, Vincent; Gegout, Pascal

doi:10.1007/s10766-017-0509-y

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

Published: 17 May 2017

Volume 46, pages 584–606, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Bastien Plazolles^1,2,
Didier El Baz¹,
Martin Spel²,
Vincent Rivola² &
…
Pascal Gegout^3,4

312 Accesses
5 Citations
Explore all metrics

Abstract

The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of the SIMD parallel codes on the K40 and K80 GPUs as well as on the Intel Xeon Phi. We emphasize on loop and task parallelism, multi-threading and vectorization, respectively. The experiments show that GPU and MIC permit one to decrease computing time by non negligeable factors, as compared to a parallel code implemented on a two sockets CPU (E5-2680-v2) which finally allows us to use these devices in operational conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Article 18 July 2016

Jing Gong, Stefano Markidis, … Misun Min

A Performance and Scalability Analysis of the MPI Based Tools Utilized in a Large Ice Sheet Model Executing in a Multicore Environment

A parallel multigrid solver for incompressible flows on computing architectures with accelerators

Article 10 May 2017

Vassilios G. Mandikas & Emmanuel N. Mathioudakis

References

Aldinucci, M., Pezzi, G.P., Drocco, M., Spampinato, C., Torquati, M.: Parallel visual data restoration on multi-gpgpus using stencil-reduce pattern. Int. J. High Perform. Comput. Appl. 29(4), 461–472 (2015)
Article Google Scholar
Boyer, V., El Baz, D., Elkihel, M.: Solving knapsack problems on GPU. Comput. Oper. Res. 39(1), 42–47 (2012). doi:10.1016/j.cor.2011.03.014. http://www.sciencedirect.com/science/article/pii/S0305054811000876. Special Issue on knapsack problems and applications
Boyer, V., El Baz, D.: Recent advances on GPU computing in operations research. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 1778–1787 (2013). doi:10.1109/IPDPSW.2013.45
Cuomo, S., Michele, P.D., Galletti, A., Marcellino, L.: A parallel pde-based numerical algorithm for computing the optical flow in hybrid systems. J. Comput. Sci. (2017). doi:10.1016/j.jocs.2017.03.011. http://www.sciencedirect.com/science/article/pii/S1877750317303010
Farber, R.: Programming Intel’s Xeon Phi: a jumpstart introduction. http://www.drdobbs.com/parallel/programming-intels-xeon-phi-a-jumpstart/240144160
Gegout, P., Oberle, P., Desjardins, C., Moyard, J., Brunet, P.M.: Ray-tracing of GNSS signal through the atmosphere powered by CUDA, HMPP and GPUs technologies. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(5), 1592–1602 (2014). doi:10.1109/JSTARS.2013.2272600
Article Google Scholar
Hoover, W.E., States., U.: Algorithms for confidence circles and ellipses [microform]. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Service Rockville, MD (1984)
Hwang, K., Fox, G.C., Dongarra, J.: Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Google Scholar
Ilg, M., Rogers, J., Costello, M.: Projectile Monte-Carlo trajectory analysis using a graphics processing unit. AIAA Atmos. Flight Mech. Conf. (2011). doi:10.2514/6.2011-6266
Intel: Thread affinity interface. https://software.intel.com/en-us/node/522691#KMP_AFFINITY_ENVIRONMENT_VARIABLE
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kaufmann, Burlington (2013)
Google Scholar
Karsten, A., Mario, M.: Odeint. http://headmyshoulder.github.io/odeint-v2/
NVIDIA: Nvidia. CUDA 7.0 programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
NVIDIA: Nvidia. CUDA 7.0. https://developer.nvidia.com/cuda-toolkit
NVIDIA: Profiler user’s guide. http://docs.nvidia.com/cuda/profiler-users-guide/#nvprof-overview
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS ’13. pp. 1085–1097. IEEE Computer Society, Washington, DC, USA (2013). doi:10.1109/IPDPS.2013.44
Plazolles, B., Spel, M., Rivola, V., El Baz, D.: Monte-Carlo analysis of object reentry in earth s atmosphere based on taguchi method. In: Proceedings of the 8th European Symposium on Aerothermodynamics for Space Vehicle, Lisbon (2015)
Rahman, R.: Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers, 1st edn. Apress, Berkely (2013)
Book Google Scholar
Robert, C.P., Casella, G.: Monte-Carlo Statistical Methods. Springer, New York (2004)
Book MATH Google Scholar
Rocchi, M.B.L., Sisti, D., Ditroilo, M., A. Calavalle, R.P.: The misuse of the confidence ellipse in evaluating statokinesigram. Ital. J. Sport Sci. 12(2), 169–171 (2005). http://hdl.handle.net/11576/2504321
Rogers, J., Slegers, N.: Robust parafoil terminal guidance using massively parallel processing. AIAA Atmos. Flight Mech. Conf. (2013). doi:10.2514/6.2012-4736
Saini, S., Jin, H., Jesperson, D., Cheung, S., Djomehri, J., Chang, J., Hood, R.: Early multi-node performance evaluation of a knights corner (KNC) based NASA supercomputer. In: IEEE 24th International Heterogeneity Computing Whorkshop (2015)
Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. CoRR abs/1302.1078 (2013). arxiv:1302.1078
Teodoro, G., Kurc, T., Kong, J., Cooper, L., Saltz, J.: Comparative performance analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: a case study from microscopy image analysis. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14, pp. 1063-1072. IEEE Computer Society, Washington, DC, USA (2014). doi:10.1109/IPDPS.2014.111
ul Hasan Khan, A., Al-Mouhamed, M., Firdaus, L.: Evaluation of Global Synchronization for Iterative Algebra Algorithms on Many-Core. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). pp. 1–6 (2015). doi:10.1109/SNPD.2015.7176173

Download references

Acknowledgements

Dr. Didier El Baz and Dr. Bastien Plazolles gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research work. The authors wish also to thank Dr. D. Gazen and Dr. J. Escobar of Observatoire Midi-Pyrénées for their advices and the access to the cluster in Toulouse. The authors thank the DEDALE work group coordinated by CNES, France. Finally, the authors thank the reviewers for their useful suggestions in order to improve the manuscript.

Author information

Authors and Affiliations

LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
Bastien Plazolles & Didier El Baz
R.Tech, Parc Technologique Delta Sud, 09340, Verniolle, France
Bastien Plazolles, Martin Spel & Vincent Rivola
Géosciences Environnement Toulouse (CNRS UMR5563), 14 Avenue Edouard Belin, 31400, Toulouse, France
Pascal Gegout
Université de Toulouse, 31400, Toulouse, France
Pascal Gegout

Authors

Bastien Plazolles
View author publications
You can also search for this author in PubMed Google Scholar
Didier El Baz
View author publications
You can also search for this author in PubMed Google Scholar
Martin Spel
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Rivola
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Gegout
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Didier El Baz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Plazolles, B., El Baz, D., Spel, M. et al. SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi. Int J Parallel Prog 46, 584–606 (2018). https://doi.org/10.1007/s10766-017-0509-y

Download citation

Received: 24 February 2017
Accepted: 10 May 2017
Published: 17 May 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10766-017-0509-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

Abstract

Access this article

Similar content being viewed by others

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

A Performance and Scalability Analysis of the MPI Based Tools Utilized in a Large Ice Sheet Model Executing in a Multicore Environment

A parallel multigrid solver for incompressible flows on computing architectures with accelerators

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

Abstract

Access this article

Similar content being viewed by others

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

A Performance and Scalability Analysis of the MPI Based Tools Utilized in a Large Ice Sheet Model Executing in a Multicore Environment

A parallel multigrid solver for incompressible flows on computing architectures with accelerators

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation