article

Free access

A set of level 3 basic linear algebra subprograms

Editor: John A. Rice Authors:

J. J. Dongarra,

Jeremy Du Croz,

Sven Hammarling,

I. S. DuffAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 16, Issue 1

Pages 1 - 17

https://doi.org/10.1145/77626.79170

Published: 01 March 1990 Publication History

PDF eReader

Abstract

This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers

References

[1]

BARRON, D. W., AND SWINNERTON-DYER, H. P.F. Solution of simultaneous linear equations using a magnetic-tape store. Comput. J. 3 (1960), 28-33.

Google Scholar

[2]

BERRY, M., GALLIVAN, K., HARROD, W., JALBY, W., LO, S., MEIER, U., PHILIPPE, B., AND SAMEH, A. Parallel algorithms on the CEDAR system. CSRD Report 581, 1986.

Google Scholar

[3]

BISCHOF, C., AND VAN LOAN, C. The WY representation for products of Householder matrices. SIAM J. Sci. Star. Comput. 8, 1 (Jan. 1987), s2-s13.

Crossref

Google Scholar

[4]

BRONLUND, O. E., AND JOHNSEN, T. QR-factorization of partitioned matrices. Comput. Meth. Appl. Mech. Eng., vol. 3, pp. 153-172, 1974.

Google Scholar

[5]

BUCHER, I., AND JORDAN, T. Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in Computer Methods for Partial Differential Equations, R. Vichnevetsky and R. Stepleman, Eds. IMACS, 1984, 546-550.

Google Scholar

[6]

CALAHAN, D.A. Block-oriented local-memory-based linear equation solution on the CRAY-2: Uniprocessor algorithms. In Proceedings International Conference on Parallel Processing (Aug. 1986). IEEE Computer Society Press, New York, 1986.

Google Scholar

[7]

CARNEVALI, P., RADICATI DI BROZOLO, G., ROBERT, Y., AND SGUAZZERO, P. Efficient Fortran implementation of the Gaussian elimination and Householder reduction algorithms on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0012, 1987.

Google Scholar

[8]

CHARTRES, B. Adaption of the Jacobi and Givens methods for a computer with magnetic tape backup store. Univ. of Sydney Tech. Rep. 8, 1960.

Google Scholar

[9]

DAVE, A. K., AND DUFF, I.S. Sparse matrix calculations on the CRAY-2. Parallel Comput. 5 (July 1987), 55-64.

Google Scholar

[10]

DEMMEL, J., DONGARRA, J. J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., AND SORENSEN, D. Prospectus for the development of a linear algebra library for high-performance computers. Argonne National Lab. Rep. ANL-MCS-TM-97, Sept. 1987.

Google Scholar

[11]

DIETRICH, G. A new formulation of the hypermatrix Householder QR-decomposition. Comput. Meth. AppI. Mech. Eng. 9 (1976), 273-280.

Google Scholar

[12]

DODSON, D., AND LEWIS, J. Issues relating to extension of the basic linear algebra subprograms. ACM SIGNUM Newsl. 20, 1 (1985), 2-18.

Crossref

Google Scholar

[13]

DONGARRA, J. J., BUNCH, J., MOLER, C., AND STEWART, G. LINPACK Users' Guide. SIAM, Philadelphia, Pa., 1979.

Google Scholar

[14]

DONGARRA, J. J., DuCRoz, J., HAMMARLING, S., AND HANSON, R. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Softw. 14, i (Mar. 1988), 1-17.

Crossref

Google Scholar

[15]

DONGARRA, J. J., DuCRoz, Z., HAMMARLING, S., AND HANSON, R. An extended set of Fortran basic linear algebra subprograms: Model implementation and test programs. ACM Trans. Math. Softw. 14, I (Mar. 1988), 18-32.

Crossref

Google Scholar

[16]

DONGARRA, J. J., DuCRoz, J., DUFF, I. S., AND HAMMARLING, S. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. This issue, pp. 18-37.

Crossref

Google Scholar

[17]

DONGARRA, J. J., AND DUFF, I.S. Advanced architecture computers. Univ. of Tennessee, Rep. CS-89-90, Nov. 1989.

Crossref

Google Scholar

[18]

DONGARRA, J. J., GUSTAVSON, F., AND KARP, A. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Rev. 26, 1 (1984), 91-112.

Google Scholar

[19]

DONGARRA, J. J., HAMMARLING, S., AND SORENSEN, O. C. Block reduction of matrices to condensed forms for eigenvalue computations. Argonne National Lab. Rep. ANL-MCS-TM-99, Sept. 1987.

Google Scholar

[20]

DONGARRA, J. J., AND HEWITT, T. Implementing dense linear algebra using multitasking on the CRAY X-MP-4. J. Comput. Appl. Math. 27 (1989), 215-227.

Google Scholar

[21]

DONGARRA, J. J., AND SORENSEN, D.C. Linear algebra on high-performance computers. In Proceedings Parallel Computing 85, U. Schendel, Ed. North Holland, Amsterdam, 1986, 113-136.

Google Scholar

[22]

DuCRoz, J., NUGENT, S., REID, J., AND TAYLOR, D. Solving large full sets of linear equations in a paged virtual store. ACM Trans. Math. Softw. 7, 4 (1981), 527-536.

Crossref

Google Scholar

[23]

DUFF, I. S. Full matrix techniques in sparse Gaussian elimination. In Numerical Analysis Proceedings, Dundee 1981, Lecture Notes in Mathematics 912. Springer-Verlag, New York, 1981, 71-84.

Google Scholar

[24]

GALLIVAN, K., JALBV, W., AND MEIER, U. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM J. Sci. Star. Comput. 8, 6 (Nov. 1987), 1079-1084.

Crossref

Google Scholar

[25]

GEORGE, A., AND RASHWAN, S. Auxiliary storage methods for solving finite element systems. SIAM J. Sci. Star. Comput. 6, 4 (Oct. 1985), 882-910.

Google Scholar

[26]

IBM. Engineering and scientific subroutine library. Program 5668-863, 1986.

Google Scholar

[27]

LAWSON, C., HANSON, R. KINCAID, D., AND KROGH, F. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5 (1979), 308-323.

Crossref

Google Scholar

[28]

LAWSON, C., HANSON, R., KINCAID, D., AND KROGH, F. Algorithm 539: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5 (1979), 324-325.

Crossref

Google Scholar

[29]

MCKELLAR, A. C., AND COFFMAN, E. G., JR. Organizing matrices and matrix operations for paged memory systems. Commun. ACM 12, 3 (1969), 153-165.

Crossref

Google Scholar

[30]

ROBERT, Y., AND SGUAZZERO, P. The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0006, 1987.

Google Scholar

[31]

SCHREIBER, R. Module design specification (Version 1.0). SAXPY Computer Corp., 255 San Geronimo Way, Sunnyvale, CA 94086, 1986.

Google Scholar

[32]

SCHREIBER, R., AND PARLETT, B. Block reflectors: Theory and computation. SIAM J. Numer. Anal. 25, 1 (Feb. 1988), 189-205.

Crossref

Google Scholar

Cited By

View all

Eiximeno BMiró ABegiashvili BValero ERodriguez ILehmkhul O(2025)PyLOM: A HPC open source reduced order model suite for fluid dynamics applicationsComputer Physics Communications10.1016/j.cpc.2024.109459308(109459)Online publication date: Mar-2025
https://doi.org/10.1016/j.cpc.2024.109459
Bertolazzi EStocco D(2025)Parallel cyclic reduction of padded bordered almost block diagonal matricesJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116331458(116331)Online publication date: Apr-2025
https://doi.org/10.1016/j.cam.2024.116331
Šír G(2024)A computational perspective on neural-symbolic integrationNeurosymbolic Artificial Intelligence10.3233/NAI-240672(1-12)Online publication date: 18-Jul-2024
https://doi.org/10.3233/NAI-240672
Show More Cited By

Index Terms

A set of level 3 basic linear algebra subprograms

Recommendations

Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs

This paper describes a model implementation and test software for the Level 3 Basic Linear Algebra Subprograms (Level3 BLAS). The Level3 BLAS are targeted at matrix-matrix operations with the aim of providing more efficient, but portable, implementations ...
Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface

This article proposes a set of Level 3 Basic Linear Algebra Subprograms and associated kernels for sparse matrices. A major goal is to design and develop a common framework to enable efficient, and portable, implementations of iterative algorithms for ...
Studies in numerical linear algebra

Reviews

Reviewer: Chaya Gurwitz

The original set of FORTRAN basic linear algebra subprograms, or Level 1 BLAS, included vector operations [1]; the routines in Level 2 BLAS were later added to provide matrix-vector operations [2]. This paper proposes adding a set of Level 3 BLAS, which would be used to perform matrix-matrix operations. The Level 1 and Level 2 BLAS have been adopted by the mathematical programming community as basic routines that are used as building blocks for software development. Efficient machine code implementations of the BLAS that take advantage of specific hardware features can lead to significant computational speedup. Using the BLAS provides portability and ease of maintenance. The proposed Level 3 BLAS are especially suitable for programming on computers with a hierarchy of memory and on machines using parallel processors. For these types of computers, computation is most efficient if the matrices are partitioned into blocks and matrix-matrix operations are performed on the blocks. On architectures that support parallel processing, operations on different blocks may be performed in parallel. The operations proposed for inclusion in the Level 3 BLAS are: matrix-matrix products, rank- k and rank-2 k updates of symmetric and Hermetian matrices, multiplication of a rectangular matrix by a triangular matrix, and solving triangular systems of equations with multiple right-hand sides. The routines are provided for four different FORTRAN data types: real, double precision, complex, and double complex. The paper describes the naming conventions and calling sequences for the subprograms, which generally follow the conventions used in the Level 2 BLAS. The authors discuss the reasoning used in selecting the operations to be included in the Level 3 BLAS. The paper concludes with a discussion of the application of the Level 3 BLAS to solve numerical linear algebra problems in terms of operations on submatrices (blocks). An example illustrates how the Level 3 BLAS can be used to implement the Cholesky factorization as a block algorithm.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 16, Issue 1

March 1990

109 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/77626

Editor:
John A. Rice

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1990

Published in TOMS Volume 16, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,474
Total Citations
View Citations
4,515
Total Downloads

Downloads (Last 12 months)438
Downloads (Last 6 weeks)61

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Eiximeno BMiró ABegiashvili BValero ERodriguez ILehmkhul O(2025)PyLOM: A HPC open source reduced order model suite for fluid dynamics applicationsComputer Physics Communications10.1016/j.cpc.2024.109459308(109459)Online publication date: Mar-2025
https://doi.org/10.1016/j.cpc.2024.109459
Bertolazzi EStocco D(2025)Parallel cyclic reduction of padded bordered almost block diagonal matricesJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116331458(116331)Online publication date: Apr-2025
https://doi.org/10.1016/j.cam.2024.116331
Šír G(2024)A computational perspective on neural-symbolic integrationNeurosymbolic Artificial Intelligence10.3233/NAI-240672(1-12)Online publication date: 18-Jul-2024
https://doi.org/10.3233/NAI-240672
ÖZ I(2024)Quantitative Performance Analysis of BLAS Libraries on GPU ArchitecturesBLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans AnaliziDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202426760626:76(40-48)Online publication date: 23-Jan-2024
https://doi.org/10.21205/deufmd.2024267606
Carretero Perez JRodríguez-Sánchez RCastelló ACatalán SIgual FQuintana-Ortí E(2024)Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processorsInternational Journal of High Performance Computing Applications10.1177/1094342023115765338:2(55-68)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1177/10943420231157653
Toledo S(2024)Algorithm 1051: UltimateKalman, Flexible Kalman Filtering and Smoothing Using Orthogonal TransformationsACM Transactions on Mathematical Software10.1145/369995850:4(1-19)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3699958
Josyula AVerma PGaonkar ABarua AHegde N(2024)Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable MatricesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673119(32-41)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673119
Li JFeng ZGao YTian SZhang HYe HZhang J(2024)High-Performance 3D convolution on the Latest Generation Sunway ProcessorProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673093(241-251)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673093
Bin KPark JPark CKim SLee KOkoshi TKo JLiKamWa R(2024)CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent ExecutionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661885(412-424)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661885
Lin SYang WHu YCai QDai MWang HLi K(2024)HPS Cholesky: Hierarchical Parallelized Supernodal Cholesky with Adaptive ParametersACM Transactions on Parallel Computing10.1145/363005111:1(1-22)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3630051
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs

Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface

Studies in numerical linear algebra

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations