More Web Proxy on the site http://driver.im/

Article

Out-of-Core Computation of the QR Factorization on Multi-core Processors

Authors:

Mercedes Marqués,

Gregorio Quintana-Ortí,

Enrique S. Quintana-Ortí,

Robert GeijnAuthors Info & Claims

Euro-Par '09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing

Pages 809 - 820

https://doi.org/10.1007/978-3-642-03869-3_75

Published: 23 August 2009 Publication History

Abstract

We target the development of high-performance algorithms for dense matrix operations where data resides on disk and has to be explicitly moved in and out of the main memory. We provide strong evidence that, even for a complex operation like the QR factorization, the use of a run-time system creates a separation of concerns between the matrix computations and I/O operations with the result that no significant changes need to be introduced to existing in-core algorithms. The library developer can thus focus on the design of algorithms-by-blocks, addressing disk memory as just another level of the memory hierarchy. Experimental results for the out-of-core computation of the QR factorization on a multi-core processor reveal the potential of this approach.

References

[1]

Baboulin, M., Giraud, L., Gratton, S., Langou, J.: Parallel tools for solving incremental dense least squares problems. application to space geodesy. Technical Report UT-CS-06-582; TR/PA/06/63, University of Tennessee; CERFACS (2006); To appear in J. of Algorithms and Computational Technology 3(1) (2009).

[2]

D'Azevedo, E.F., Dongarra, J.J.: The design and implementation of the parallel out-of-core scalapack LU, QR, and Cholesky factorization routines. LAPACK Working Note 118 CS-97-247, University of Tennessee, Knoxville (1997).

Digital Library

[3]

Reiley, W.C., van de Geijn, R.A.: POOCLAPACK: Parallel Out-of-Core Linear Algebra Package. Technical Report CS-TR-99-33, Department of Computer Sciences, The University of Texas at Austin (1999).

Digital Library

[4]

Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science (1999).

Digital Library

[5]

Marqués, M., Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.: Solving "large" dense matrix problems on multi-core processors. In: 10th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - PDSEC 2009 (to appear, 2009).

Digital Library

[6]

Van Zee, F.G.: The complete reference (2008) (in preparation), http://www.cs.utexas.edu/users/flame

[7]

Baboulin, M.: Solving large dense linear least squares problems on parallel distributed computers. Application to the Earth's gravity field computation. Ph.D. dissertation, INPT, TH/PA/06/22 (2006).

[8]

Gunter, B.C.: Computational methods and processing strategies for estimating Earth's gravity field. PhD thesis, The University of Texas at Austin (2004).

Digital Library

[9]

Geng, P., Oden, J.T., van de Geijn, R.: Massively parallel computation for acoustical scattering problems using boundary element methods. Journal of Sound and Vibration 191(1), 145-165 (1996).

[10]

Schafer, N., Serban, R., Negrut, D.: Implicit integration in molecular dynamics simulation. In: ASME International Mechanical Engineering Congress & Exposition (2008) (IMECE2008-66438).

[11]

Zhang, Y., Sarkar, T.K., van de Geijn, R.A., Taylor, M.C.: Parallel MoM using higher order basis function and PLAPACK in-core and out-of-core solvers for challenging EM simulations. In: IEEE AP-S & USNC/URSI Symposium (2008).

[12]

Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating the QR factorization. ACM Transactions on Mathematical Software 31(1), 60-78 (2005).

Digital Library

[13]

Watkins, D.S.: Fundamentals of Matrix Computations, 2nd edn. John Wiley & Sons, Inc., New York (2002).

[14]

Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software 16(1), 1-17 (1990).

Digital Library

[15]

Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R., Zee, F.V., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software (2008) (to appear), FLAME Working Note #32, http://www.cs.utexas.edu/users/flame/

Digital Library

[16]

Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users' Guide. SIAM, Philadelphia (1992).

Digital Library

Cited By

Castaldo AWhaley RSamuel S(2013)Scaling LAPACK panel operations using parallel cache assignmentACM Transactions on Mathematical Software10.1145/2491491.249149239:4(1-30)Online publication date: 23-Jul-2013
https://dl.acm.org/doi/10.1145/2491491.2491492
Michailidis PMargaritis KIvanović MBudimac Z(2012)Performance study of matrix computations using multi-core programming toolsProceedings of the Fifth Balkan Conference in Informatics10.1145/2371316.2371353(186-192)Online publication date: 16-Sep-2012
https://dl.acm.org/doi/10.1145/2371316.2371353
Michailidis PMargaritis K(2011)Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMPJournal of Computational and Applied Mathematics10.1016/j.cam.2011.07.023236:3(326-341)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1016/j.cam.2011.07.023
Show More Cited By

Recommendations

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors
Abstract
We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform equipped with a graphics ...
A BLAS-3 Version of the QR Factorization with Column Pivoting

The QR factorization with column pivoting (QRP), originally suggested by Golub [Numer. Math., 7 (1965), 206--216], is a popular approach to computing rank-revealing factorizations. Using Level 1 BLAS, it was implemented in LINPACK, and, using Level 2 BLAS,...
Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors
Euro-Par 2019: Parallel Processing
Abstract
We present a method for the QR factorization of large tall-and-skinny matrices that combines block Gram-Schmidt and the Cholesky decomposition to factorize the input matrix column panels, overcoming the sequential nature of this operation. This ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Euro-Par '09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing

August 2009

1082 pages

ISBN:9783642038686

Editors:
Henk Sips
Department of Software Technology, Delft University of Technology, Delft, The Netherlands 2628
,
Dick Epema
Department of Software Technology, Delft University of Technology, Delft, The Netherlands 2628
,
Hai-Xiang Lin
Department of Software Technology, Delft University of Technology, Delft, The Netherlands 2628

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Castaldo AWhaley RSamuel S(2013)Scaling LAPACK panel operations using parallel cache assignmentACM Transactions on Mathematical Software10.1145/2491491.249149239:4(1-30)Online publication date: 23-Jul-2013
https://dl.acm.org/doi/10.1145/2491491.2491492
Michailidis PMargaritis KIvanović MBudimac Z(2012)Performance study of matrix computations using multi-core programming toolsProceedings of the Fifth Balkan Conference in Informatics10.1145/2371316.2371353(186-192)Online publication date: 16-Sep-2012
https://dl.acm.org/doi/10.1145/2371316.2371353
Michailidis PMargaritis K(2011)Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMPJournal of Computational and Applied Mathematics10.1016/j.cam.2011.07.023236:3(326-341)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1016/j.cam.2011.07.023
Castaldo AWhaley R(2010)Scaling LAPACK panel operations using parallel cache assignmentACM SIGPLAN Notices10.1145/1837853.169348445:5(223-232)Online publication date: 9-Jan-2010
https://dl.acm.org/doi/10.1145/1837853.1693484
Castaldo AWhaley RGovindarajan RPadua DHall M(2010)Scaling LAPACK panel operations using parallel cache assignmentProceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/1693453.1693484(223-232)Online publication date: 9-Jan-2010
https://dl.acm.org/doi/10.1145/1693453.1693484

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents