Abstract
In parallel finite element solvers, sparse matrix assembly is often a bottleneck. Implemented using message passing, latency from message matching starts to limit performance as the number of cores increases. We here address this issue by using our own stack based representation of the sparse matrix, and a hybrid parallel programming model combining traditional message passing with one-sided communication. This gives an significantly faster insertion rate compared to state of the art implementations on a Cray XE6.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Balay, S., Buschelman, K., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc Web page (2009), http://www.mcs.anl.gov/petsc
Blagojević, F., Hargrove, P., Iancu, C., Yelick, K.: Hybrid PGAS runtime support for multicore nodes. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS 2010, pp. 3:1–3:10. ACM, New York (2010)
Bonachea, D., Duell, J.: Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perform. Comput. Networking 1, 91–99 (2004)
Coarfă, C., Dotsenko, Y., Mellor-Crummey, J., Cantonnet, F., El-Ghazawi, T., Mohanti, A., Yao, Y., Chavarría-Miranda, D.: An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2005, pp. 36–47. ACM, New York (2005)
Dinan, J., Balaji, P., Lusk, E., Sadayappan, P., Thakur, R.: Hybrid parallel programming with MPI and unified parallel C. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, CF 2010, pp. 177–186. ACM, New York (2010)
Jansson, N.: Data Structures for Efficient Sparse Matrix Assembly. Technical Report KTH-CTL-4013, Computational Technology Laboratory (2011), http://www.publ.kth.se/trita/ctl-4/013/
Jansson, N.: High performance adaptive finite element methods for turbulent fluid flow. Licentiate thesis, Royal Institute of Technology, School of Computer Science and Engineering, TRITA-CSC-A 2011, 02 (2011)
Jansson, N.: JANPACK (2012), http://www.csc.kth.se/~njansson/janpack
Jansson, N., Hoffman, J., Nazarov, M.: Adaptive Simulation of Turbulent Flow Past a Full Car Model. In: State of the Practice Reports, SC 2011, pp. 20:1–20:8. ACM, New York (2011)
Jose, J., Luo, M., Sur, S., Panda, D.K.: Unifying UPC and MPI runtimes: experience with MVAPICH. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS 2010, pp. 5:1–5:10. ACM, New York (2010)
Logg, A., Wells, G.N.: DOLFIN: Automated finite element computing. ACM Trans. Math. Softw. 37(2), 20:1–20:28 (2010)
Mallón, D.A., Taboada, G.L., Teijeiro, C., Touriño, J., Fraguela, B.B., Gómez, A., Doallo, R., Mouriño, J.C.: Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) EuroPVM/MPI 2009. LNCS, vol. 5759, pp. 174–184. Springer, Heidelberg (2009)
MPI Forum. Message Passing Interface (MPI) Forum Home Page, http://www.mpi-forum.org/
OpenMP Architecture Review Board. Openmp application program interface (2008), http://www.openmp.org/mp-documents/spec30.pdf
Pletzer, A., McCune, D., Muszala, S., Vadlamani, S., Kruger, S.: Exposing Fortran Derived Types to C and Other Languages. Comput. Sci. Eng. 10(4), 86–92 (2008)
Preissl, R., Wichmann, N., Long, B., Shalf, J., Ethier, S., Koniges, A.: Multithreaded Global Address Space Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 1–5. ACM, New York (2011)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)
UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jansson, N. (2013). Optimizing Sparse Matrix Assembly in Finite Element Solvers with One-Sided Communication. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science - VECPAR 2012. VECPAR 2012. Lecture Notes in Computer Science, vol 7851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38718-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-38718-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38717-3
Online ISBN: 978-3-642-38718-0
eBook Packages: Computer ScienceComputer Science (R0)