Toward Scalable Matrix Multiply on Multithreaded Architectures

Marker, Bryan; Van Zee, Field G.; Goto, Kazushige; Quintana-Ortí, Gregorio; van de Geijn, Robert A.

doi:10.1007/978-3-540-74466-5_79

Bryan Marker¹,
Field G. Van Zee²,
Kazushige Goto²,
Gregorio Quintana-Ortí³ &
…
Robert A. van de Geijn²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4641))

Included in the following conference series:

European Conference on Parallel Processing

817 Accesses
4 Citations

Abstract

We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.

Download to read the full chapter text

Chapter PDF

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

An Analytical Model for Matrix Multiplication on Many Threaded Vector Processors

FooPar: A Functional Object Oriented Parallel Framework in Scala

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16(1), 1–17 (1990)
Article MATH Google Scholar
Kågström, B., Ling, P., Loan, C.V.: GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark. ACM Trans. Math. Soft. 24(3), 268–302 (1998)
Article MATH Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide - Release 2.0. SIAM (1994)
Google Scholar
Gunnels, J.A., Henry, G.M., van de Geijn, R.A.: A family of high-performance matrix multiplication algorithms. In: Alexandrov, V.N., Dongarra, J.J., Juliano, B.A., Renner, R.S., Tan, C.K. (eds.) ICCS 2001. LNCS, vol. 2073, pp. 51–60. Springer, Heidelberg (2001)
Chapter Google Scholar
Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication ACM Trans. Math. Soft. (to appear)
Google Scholar
Agarwal, R.C., Gustavson, F., Zubair, M.: A high-performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication. IBM Journal of Research and Development 38(6) (1994)
Google Scholar
van de Geijn, R., Watts, J.: SUMMA: Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9(4), 255–274 (1997)
Article Google Scholar
Gunnels, J., Lin, C., Morrow, G., van de Geijn, R.: A flexible class of parallel matrix multiplication algorithms. In: Proceedings of First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (1998 IPPS/SPDP 1998), pp. 110–116 (1998)
Google Scholar
Low, T.M., Milfeld, K., van de Geijn, R., Zee, F.V.: Parallelizing FLAME code with OpenMP task queues. Technical Report TR-04-50, The University of Texas at Austin, Department of Computer Sciences (December 2004)
Google Scholar
Van Zee, F.G., Bientinesi, P., Low, T.M., van de Geijn, R.A.: Scalable parallelization of FLAME code via the workqueuing model. ACM Trans. Math. Soft. (submitted, 2007)
Google Scholar
Stewart, G.W.: Communication and matrix computations on large message passing systems. Parallel Computing 16, 27–40 (1990)
Article MATH Google Scholar
Lichtenstein, W., Johnsson, S.L.: Block-cyclic dense linear algebra. Technical Report TR-04-92, Harvard University, Center for Research in Computing Technology (January 1992)
Google Scholar
Hendrickson, B.A., Womble, D.E.: The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM J. Sci. Stat. Comput. 15(5), 1201–1226 (1994)
Article MATH MathSciNet Google Scholar
Dongarra, J., van de Geijn, R., Walker, D.: Scalability issues affecting the design of a dense linear algebra library. J. Parallel Distrib. Comput. 22(3) (September 1994)
Google Scholar
Addison, C., Ren, Y.: OpenMP issues arrising in the development of parallel BLAS and LAPACK libraries. In: EWOMP (2001)
Google Scholar
Chan, E., Ortí, E.S.Q., Ortí, G.Q., van de Geijn, R.: Supermatrix out-of-order scheduling of matrix operations for smp and multi-core architectures. In: SPAA (2007)(submitted)
Google Scholar
Kurzak, J., Dongarra, J.: Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. LAPACK Working Note 178 UT-CS-06-581, University of Tennessee (September 2006)
Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. SIAM (1997)
Google Scholar
van de Geijn, R.A.: Using PLAPACK: Parallel Linear Algebra Package. MIT Press, Cambridge (1997)
Google Scholar
Bientinesi, P., Quintana-Ortí, E.S., van de Geijn, R.A.: Representing linear algebra algorithms in code: The FLAME APIs. ACM Trans. Math. Soft. 31(1), 27–59 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Instruments,
Bryan Marker
The University of Texas at Austin,
Field G. Van Zee, Kazushige Goto & Robert A. van de Geijn
Universidad Jaume I, Spain
Gregorio Quintana-Ortí

Authors

Bryan Marker
View author publications
You can also search for this author in PubMed Google Scholar
Field G. Van Zee
View author publications
You can also search for this author in PubMed Google Scholar
Kazushige Goto
View author publications
You can also search for this author in PubMed Google Scholar
Gregorio Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar
Robert A. van de Geijn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Anne-Marie Kermarrec Luc Bougé Thierry Priol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marker, B., Van Zee, F.G., Goto, K., Quintana-Ortí, G., van de Geijn, R.A. (2007). Toward Scalable Matrix Multiply on Multithreaded Architectures. In: Kermarrec, AM., Bougé, L., Priol, T. (eds) Euro-Par 2007 Parallel Processing. Euro-Par 2007. Lecture Notes in Computer Science, vol 4641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74466-5_79

Download citation

DOI: https://doi.org/10.1007/978-3-540-74466-5_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74465-8
Online ISBN: 978-3-540-74466-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Toward Scalable Matrix Multiply on Multithreaded Architectures

Abstract

Chapter PDF

Similar content being viewed by others

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

An Analytical Model for Matrix Multiplication on Many Threaded Vector Processors

FooPar: A Functional Object Oriented Parallel Framework in Scala

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Toward Scalable Matrix Multiply on Multithreaded Architectures

Abstract

Chapter PDF

Similar content being viewed by others

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

An Analytical Model for Matrix Multiplication on Many Threaded Vector Processors

FooPar: A Functional Object Oriented Parallel Framework in Scala

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation