Article

Free access

Vector and parallel algorithms for Cholesky factorization on IBM 3090

Authors:

R. C. Agarwal,

F. G. GustavsonAuthors Info & Claims

Supercomputing '89: Proceedings of the 1989 ACM/IEEE conference on Supercomputing

Pages 225 - 233

https://doi.org/10.1145/76263.76287

Published: 01 August 1989 Publication History

PDF eReader

Abstract

In many engineering applications, a solution of Fx = b is required, where F is a positive definite symmetric matrix. This is usually done by the Cholesky factorization, F = RR^T, where R is the lower triangular Cholesky factor. This is a compute intensive problem. However, in order to achieve the best possible performance on IBM 3090 Vector Facility, the problem requires blocking at various levels to match 3090 memory hierarchy. A large problem which does not fit in a particular level of memory is blocked so that each block fits in memory. This minimizes data transfers between various levels of memory. In this paper, various blocking schemes are described for vector and parallel implementation on 3090 VF. Some of these algorithms have been included in the Engineering and Scientific Subroutine Library (ESSL). Performance numbers are also included. These algorithms achieve close to the peak performance of the 3090 uniprocessor and multiprocessors.

References

[1]

J.J. Dongarra, J. Bunch, C. Moler, and (3. Stewart, I,INPACK User's Guide, SIAM Pub., 1979.

Google Scholar

[2]

J. Oemme|, J.J. l)ongarra, J. Du Croz, A. Greenbaum, S. tlammarling, and I). Sorenson, "Prospectus for the development of a linear algebra library for high-performance computers", Argonne National Laboratory, Mathematics and Computer Science Division, Technical Memorandum No. 97, Sept. 1987.

Google Scholar

[3]

C. Bischof, J. Demmel, J. Dongarra, J. DtJ Croz, A. Greenbaum, S. Hammarling, and D. Sorensen, "I~AI)ACK working note #5, Provisional contents", Argonne National Labor~tory, Mathematics and Computer Science Division, ANI_,-88-38, Sept. 1988.

Google Scholar

[4]

Preliminary meeting on BLAS 3 adoption, Argonne National l.aborztory, Jan. 27-29, 1987.

Google Scholar

[5]

ESSI_, Guide and Reference, order number SC23-0184-0, IBM Corp., Feb., 1986.

Google Scholar

[6]

S. Katoh, IBM Corp., private communication, 1989.

Google Scholar

[7]

VS FORTFR, AN, Version 2, I,anguage and l.,ibrary Reference, order number SC26-4221-3, IBM Corp., March, 1988.

Google Scholar

Cited By

View all

Kim MRo W(2014)Architectural investigation of matrix data layout on multicore processorsFuture Generation Computer Systems10.1016/j.future.2013.10.02037(64-75)Online publication date: Jul-2014
https://doi.org/10.1016/j.future.2013.10.020
Ltaief HKurzak JDongarra JBadia R(2010)Scheduling two-sided transformations using tile algorithms on multicore architecturesScientific Programming10.1155/2010/57472818:1(35-50)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1155/2010/574728
Chan Evan de Geijn RVan Zee FNagle J(2010)Transforming linear algebra libraries: From abstraction to parallelism2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)10.1109/IPDPSW.2010.5470879(1-8)Online publication date: Apr-2010
https://doi.org/10.1109/IPDPSW.2010.5470879
Show More Cited By

Index Terms

Vector and parallel algorithms for Cholesky factorization on IBM 3090

Recommendations

Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor Ibm 3090 VF/600J

Efficient parallel block algorithms for the LU factorization with partial pivoting, the Cholesky factorization, and the QR factorization transportable over a range of parallel MIMD architectures are presented. Parallel implementations of different block ...
Highly Parallel Sparse Cholesky Factorization

This paper develops and compares several fine-grained parallel algorithms to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed-memory SIMD machine whose programming model ...
Row Modifications of a Sparse Cholesky Factorization

Given a sparse, symmetric positive definite matrix C and an associated sparse Cholesky factorization LDL$\tr$, we develop sparse techniques for updating the factorization after a symmetric modification of a row and column of C. We show how the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Supercomputing '89: Proceedings of the 1989 ACM/IEEE conference on Supercomputing

August 1989

849 pages

ISBN:0897913418

DOI:10.1145/76263

Chairman:
F. Ron Bailey

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Los Alamos National Labs: Los Alamos National Labs
NASA: National Aeronatics and Space Administration
Argonne Natl Lab: Argonne National Lab

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1989

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SC '89

Sponsor:

SIGARCH
IEEE-CS

SC '89: International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 1989

Nevada, Reno, USA

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
503
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kim MRo W(2014)Architectural investigation of matrix data layout on multicore processorsFuture Generation Computer Systems10.1016/j.future.2013.10.02037(64-75)Online publication date: Jul-2014
https://doi.org/10.1016/j.future.2013.10.020
Ltaief HKurzak JDongarra JBadia R(2010)Scheduling two-sided transformations using tile algorithms on multicore architecturesScientific Programming10.1155/2010/57472818:1(35-50)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1155/2010/574728
Chan Evan de Geijn RVan Zee FNagle J(2010)Transforming linear algebra libraries: From abstraction to parallelism2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)10.1109/IPDPSW.2010.5470879(1-8)Online publication date: Apr-2010
https://doi.org/10.1109/IPDPSW.2010.5470879
Quintana-Ortí GQuintana-Ortí EGeijn RZee FChan E(2009)Programming matrix algorithms-by-blocks for thread-level parallelismACM Transactions on Mathematical Software10.1145/1527286.152728836:3(1-26)Online publication date: 23-Jul-2009
https://dl.acm.org/doi/10.1145/1527286.1527288
Gustavson FKarlsson LKågström B(2009)Distributed SBP Cholesky factorization algorithms with near-optimal schedulingACM Transactions on Mathematical Software10.1145/1499096.149910036:2(1-25)Online publication date: 7-Apr-2009
https://dl.acm.org/doi/10.1145/1499096.1499100
Buttari ALangou JKurzak JDongarra J(2009)A class of parallel tiled linear algebra algorithms for multicore architecturesParallel Computing10.1016/j.parco.2008.10.00235:1(38-53)Online publication date: 1-Jan-2009
https://dl.acm.org/doi/10.1016/j.parco.2008.10.002
Kurzak JLtaief HDongarra JBadia R(2009)Scheduling dense linear algebra operations on multicore processorsConcurrency and Computation: Practice and Experience10.1002/cpe.146722:1(15-44)Online publication date: 11-Aug-2009
https://doi.org/10.1002/cpe.1467
Volkov VDemmel JTeller P(2008)Benchmarking GPUs to tune dense linear algebraProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413402(1-11)Online publication date: 15-Nov-2008
https://dl.acm.org/doi/10.5555/1413370.1413402
Chan EVan Zee FBientinesi PQuintana-Orti EQuintana-Orti Gvan de Geijn RChatterjee SScott M(2008)SuperMatrixProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345227(123-132)Online publication date: 20-Feb-2008
https://dl.acm.org/doi/10.1145/1345206.1345227
Kurzak JButtari ADongarra J(2008)Solving Systems of Linear Equations on the CELL Processor Using Cholesky FactorizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.7081319:9(1175-1186)Online publication date: 1-Sep-2008
https://dl.acm.org/doi/10.1109/TPDS.2007.70813
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor Ibm 3090 VF/600J

Highly Parallel Sparse Cholesky Factorization

Row Modifications of a Sparse Cholesky Factorization