|
ABSTRACT
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
BARRON, D. W., AND SWINNERTON-DYER, H. P.F. Solution of simultaneous linear equations using a magnetic-tape store. Comput. J. 3 (1960), 28-33.
|
| |
2
|
BERRY, M., GALLIVAN, K., HARROD, W., JALBY, W., LO, S., MEIER, U., PHILIPPE, B., AND SAMEH, A. Parallel algorithms on the CEDAR system. CSRD Report 581, 1986.
|
| |
3
|
|
| |
4
|
BRONLUND, O. E., AND JOHNSEN, T. QR-factorization of partitioned matrices. Comput. Meth. Appl. Mech. Eng., vol. 3, pp. 153-172, 1974.
|
| |
5
|
BUCHER, I., AND JORDAN, T. Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in Computer Methods for Partial Differential Equations, R. Vichnevetsky and R. Stepleman, Eds. IMACS, 1984, 546-550.
|
| |
6
|
CALAHAN, D.A. Block-oriented local-memory-based linear equation solution on the CRAY-2: Uniprocessor algorithms. In Proceedings International Conference on Parallel Processing (Aug. 1986). IEEE Computer Society Press, New York, 1986.
|
| |
7
|
CARNEVALI, P., RADICATI DI BROZOLO, G., ROBERT, Y., AND SGUAZZERO, P. Efficient Fortran implementation of the Gaussian elimination and Householder reduction algorithms on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0012, 1987.
|
| |
8
|
CHARTRES, B. Adaption of the Jacobi and Givens methods for a computer with magnetic tape backup store. Univ. of Sydney Tech. Rep. 8, 1960.
|
| |
9
|
DAVE, A. K., AND DUFF, I.S. Sparse matrix calculations on the CRAY-2. Parallel Comput. 5 (July 1987), 55-64.
|
| |
10
|
DEMMEL, J., DONGARRA, J. J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., AND SORENSEN, D. Prospectus for the development of a linear algebra library for high-performance computers. Argonne National Lab. Rep. ANL-MCS-TM-97, Sept. 1987.
|
| |
11
|
DIETRICH, G. A new formulation of the hypermatrix Householder QR-decomposition. Comput. Meth. AppI. Mech. Eng. 9 (1976), 273-280.
|
 |
12
|
|
| |
13
|
DONGARRA, J. J., BUNCH, J., MOLER, C., AND STEWART, G. LINPACK Users' Guide. SIAM, Philadelphia, Pa., 1979.
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
DONGARRA, J. J., GUSTAVSON, F., AND KARP, A. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Rev. 26, 1 (1984), 91-112.
|
| |
19
|
DONGARRA, J. J., HAMMARLING, S., AND SORENSEN, O. C. Block reduction of matrices to condensed forms for eigenvalue computations. Argonne National Lab. Rep. ANL-MCS-TM-99, Sept. 1987.
|
| |
20
|
DONGARRA, J. J., AND HEWITT, T. Implementing dense linear algebra using multitasking on the CRAY X-MP-4. J. Comput. Appl. Math. 27 (1989), 215-227.
|
| |
21
|
DONGARRA, J. J., AND SORENSEN, D.C. Linear algebra on high-performance computers. In Proceedings Parallel Computing 85, U. Schendel, Ed. North Holland, Amsterdam, 1986, 113-136.
|
 |
22
|
|
| |
23
|
DUFF, I. S. Full matrix techniques in sparse Gaussian elimination. In Numerical Analysis Proceedings, Dundee 1981, Lecture Notes in Mathematics 912. Springer-Verlag, New York, 1981, 71-84.
|
| |
24
|
|
| |
25
|
GEORGE, A., AND RASHWAN, S. Auxiliary storage methods for solving finite element systems. SIAM J. Sci. Star. Comput. 6, 4 (Oct. 1985), 882-910.
|
| |
26
|
IBM. Engineering and scientific subroutine library. Program 5668-863, 1986.
|
 |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
ROBERT, Y., AND SGUAZZERO, P. The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0006, 1987.
|
| |
31
|
SCHREIBER, R. Module design specification (Version 1.0). SAXPY Computer Corp., 255 San Geronimo Way, Sunnyvale, CA 94086, 1986.
|
| |
32
|
|
CITED BY 158
|
|
|
|
|
|
|
Thierry Joffrain , Tze Meng Low , Enrique S. Quintana-Ortí , Robert van de Geijn , Field G. Van Zee, Accumulating Householder transformations, revisited, ACM Transactions on Mathematical Software (TOMS), v.32 n.2, p.169-179, June 2006
|
|
|
|
|
|
|
|
|
|
|
|
Yong Dou , S. Vassiliadis , G. K. Kuzmanov , G. N. Gaydadjiev, 64-bit floating-point FPGA matrix multiplication, Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays, February 20-22, 2005, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bryan S. Morse , Terry S. Yoo , Penny Rheingans , David T. Chen , K. R. Subramanian, Interpolating implicit surfaces from scattered surface data using compactly supported radial basis functions, ACM SIGGRAPH 2005 Courses, July 31-August 04, 2005, Los Angeles, California
|
|
|
|
|
G. von Laszewski , M. Parashar , A. G. Mohamed , G. C. Fox, On the parallelization of blocked LU factorization algorithms on distributed memory architectures, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.170-179, November 16-20, 1992, Minneapolis, Minnesota, United States
|
|
G.-S. Karamanos , C. Evangelinos , R. C. Boes , R. M. Kirby , G. E. Karniadakis, Direct numerical simulation of turbulence with a PC/linux cluster: fact or fiction?, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.53-es, November 14-19, 1999, Portland, Oregon, United States
|
|
Anurag Acharya , Mustafa Uysal , Robert Bennett , Assaf Mendelson , Michael Beynon , Jeff Hollingsworth , Joel Saltz , Alan Sussman, Tuning the performance of I/O-intensive parallel applications, Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, p.15-27, May 27-27, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. L. Dai , S. K. S. Gupta , S. D. Kaushik , J. H. Lu , R. V. Singh , C.-H. Huang , P. Sadayappan , R. W. Johnson, EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms, Proceedings of the 1994 conference on Supercomputing, p.49-58, December 1994, Washington, D.C., United States
|
|
|
|
|
|
D. L. Dai , S. K. S. Gupta , S. D. Kaushik , J. H. Lu , R. V. Singh , C. H. Huang , P. Sadayappan , R. W. Johnson, EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms, Proceedings of the 1994 ACM/IEEE conference on Supercomputing, November 14-18, 1994, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonardo Bachega , Siddhartha Chatterjee , Kenneth A. Dockser , John A. Gunnels , Manish Gupta , Fred G. Gustavson , Christopher A. Lapkowski , Gary K. Liu , Mark P. Mendell , Charles D. Wait , T. J. Chris Ward, A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.85-96, September 29-October 03, 2004
|
|
|
|
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, A framework for sparse matrix code synthesis from high-level specifications, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.58-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
Steven Huss-Lederman , Elaine M. Jacobson , Anna Tsao , Thomas Turnbull , Jeremy R. Johnson, Implementation of Strassen's algorithm for matrix multiplication, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.32-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
L. S. Blackford , A. Cleary , A. Petitet , R. C. Whaley , J. Demmel , I. Dhillon , H. Ren , K. Stanley , J. Dongarra , S. Hammarling, Practical experience in the numerical dangers of heterogeneous computing, ACM Transactions on Mathematical Software (TOMS), v.23 n.2, p.133-147, June 1997
|
|
|
|
Philip Alpatov , Greg Baker , Carter Edwards , John Gunnels , Greg Morrow , James Overfelt , Robert van de Geijn , Yuan-Jye J. Wu, PLAPACK: parallel linear algebra package design overview, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-16, November 15-21, 1997, San Jose, CA
|
|
|
|
|
|
Sally A. McKee , Assaji Aluwihare , Benjamin H. Clark , Robert H. Klenke , Trevor C. Landon , Christopher W. Oliver , Maximo H. Salinas , Adam E. Szymkowiak , Kenneth L. Wright , Wm. A. Wulf , James H. Aylor, Design and evaluation of dynamic access ordering hardware, Proceedings of the 10th international conference on Supercomputing, p.125-132, May 25-28, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sally A. McKee , Robert H. Klenke , Kenneth L. Wright , William A. Wulf , Maximo H. Salinas , James H. Aylor , Alan P. Batson, Smarter Memory: Improving Bandwidth for Streamed References, Computer, v.31 n.7, p.54-63, July 1998
|
|
|
|
|
|
|
|
|
|
|
Anshul Gupta , Fred G. Gustavson , Mahesh Joshi , Sivan Toledo, The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers, ACM Transactions on Mathematical Software (TOMS), v.24 n.1, p.74-101, March 1998
|
|
|
|
|
|
|
Sivan Toledo , Fred G. Gustavson, The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations, Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, p.28-40, May 27-27, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mahmut Kandemir , Alok Choudhary , J. Ramanujam , Meenakshi A. Kandaswamy, A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations, IEEE Transactions on Parallel and Distributed Systems, v.11 n.7, p.648-668, July 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Laura Susan Blackford , J. Choi , A. Cleary , A. Petitet , R. C. Whaley , J. Demmel , I. Dhillon , K. Stanley , J. Dongarra , S. Hammarling , G. Henry , D. Walker, ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.5-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
Ernie Chan , Enrique S. Quintana-Orti , Gregorio Quintana-Orti , Robert van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, June 09-11, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sandhya Krishnan , Sriram Krishnamoorthy , Gerald Baumgartner , Chi-Chung Lam , J. Ramanujam , P. Sadayappan , Venkatesh Choppella, Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver, Journal of Parallel and Distributed Computing, v.66 n.5, p.659-673, May 2006
|
|
|
|
|
|
Sally A. McKee , William A. Wulf , James H. Aylor , Maximo H. Salinas , Robert H. Klenke , Sung I. Hong , Dee A. B. Weikle, Dynamic Access Ordering for Streamed Computations, IEEE Transactions on Computers, v.49 n.11, p.1255-1271, November 2000
|
|
|
|
|
|
E. Anderson , Z. Bai , J. Dongarra , A. Greenbaum , A. McKenney , J. Du Croz , S. Hammerling , J. Demmel , C. Bischof , D. Sorensen, LAPACK: a portable linear algebra library for high-performance computers, Proceedings of the 1990 conference on Supercomputing, p.2-11, October 1990, New York, New York, United States
|
|
Xiaoye S. Li , James W. Demmel , David H. Bailey , Greg Henry , Yozo Hida , Jimmy Iskandar , William Kahan , Suh Y. Kang , Anil Kapur , Michael C. Martin , Brandon J. Thompson , Teresa Tung , Daniel J. Yoo, Design, implementation and testing of extended and mixed precision BLAS, ACM Transactions on Mathematical Software (TOMS), v.28 n.2, p.152-205, June 2002
|
|
|
|
|
|
E. Anderson , Z. Bai , J. Dongarra , A. Greenbaum , A. McKenney , J. Du Croz , S. Hammarling , J. Demmel , C. Bischof , D. Sorensen, LAPACK: a portable linear algebra library for high-performance computers, Proceedings of the 1990 ACM/IEEE conference on Supercomputing, p.2-11, November 12-16, 1990, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nikita Kojekine , Vladimir Savchenko , Ichiro Hagiwara, Surface reconstruction based on compactly supported radial basis functions, Geometric modeling: techniques, applications, systems and tools, Kluwer Academic Publishers, Norwell, MA, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jarek Nieplocha , Bruce Palmer , Vinod Tipparaju , Manojkumar Krishnan , Harold Trease , Edoardo Aprà, Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit, International Journal of High Performance Computing Applications, v.20 n.2, p.203-231, May 2006
|
|
|
|
| |