Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

Emmanuel Agullo²⁰,
Henricus Bouwmeester²¹,
Jack Dongarra²⁰,
Jakub Kurzak²⁰,
Julien Langou²¹ &
…
Lee Rosenberg²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6449))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

1618 Accesses
6 Citations

Abstract

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Experiments with Sparse Cholesky Using a Parametrized Task Graph Implementation

Exploiting Data Sparsity for Large-Scale Matrix Computations

Multithreaded Multifrontal Sparse Cholesky Factorization Using Threading Building Blocks

References

BLAS: Basic linear algebra subprograms, http://www.netlib.org/blas/
Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H.: PLASMA Users’ Guide. Technical report, ICL, UTK (2009)
Google Scholar
Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC 2009: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. ACM, New York (2009)
Chapter Google Scholar
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.W., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)
MATH Google Scholar
Bientinesi, P., Gunter, B., van de Geijn, R.: Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw. 35(1), 1–22 (2008)
Article MathSciNet Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)
Book MATH Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper. 20(13), 1573–1590 (2008)
Article Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 35(1), 38–53 (2009)
Article MathSciNet Google Scholar
Chan, E.: Runtime data flow scheduling of matrix computations. FLAME Working Note #39. Technical Report TR-09-22, The University of Texas at Austin, Department of Computer Sciences (August 2009)
Google Scholar
Chan, E., Van Zee, F.G., Bientinesi, P., Quintana-Ortí, E.S., Quintana-Ortí, G., van de Geijn, R.: Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 123–132. ACM, New York (2008)
Google Scholar
Christofides, N.: Graph Theory: An algorithmic Approach (1975)
Google Scholar
Du Croz, J.J., Higham, N.J.: Stability of methods for matrix inversion. IMA Journal of Numerical Analysis 12, 1–19 (1992)
Article MathSciNet MATH Google Scholar
Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks®. IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998)
Article Google Scholar
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Book MATH Google Scholar
Kurzak, J., Dongarra, J.: Fully dynamic scheduler for numerical computing on multicore processors. University of Tennessee CS Tech. Report, UT-CS-09-643 (2009)
Google Scholar
Kurzak, J., Dongarra, J.: QR factorization for the Cell Broadband Engine. Sci. Program. 17(1-2), 31–42 (2009)
Google Scholar
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of IEEE Cluster Computing 2008 (2008)
Google Scholar
Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software 36(3)
Google Scholar
Rinard, M.C., Scales, D.J., Lam, M.S.: Jade: A high-level, machine-independent language for parallel programming. Computer 6, 28–38 (1993)
Article Google Scholar
Sutter, H.: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (2005)
Google Scholar
Wolfe, M.: Doany: Not just another parallel loop. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 421–433. Springer, Heidelberg (1993)
Chapter Google Scholar
Van Zee, F.G.: libflame: The Complete Reference (2009), http://www.lulu.com

Download references

Author information

Authors and Affiliations

Dept. of Electrical Engineering and Computer Science, University of Tennessee, 1122 Volunteer Blvd, Claxton Building, Knoxville, TN, 37996-3450, USA
Emmanuel Agullo, Jack Dongarra & Jakub Kurzak
Dept. of Mathematical and Statistical Sciences, University of Colorado Denver, Campus Box 170, P.O. Box 173364, Denver, Colorado, 80217-3364, USA
Henricus Bouwmeester, Julien Langou & Lee Rosenberg

Authors

Emmanuel Agullo
View author publications
You can also search for this author in PubMed Google Scholar
Henricus Bouwmeester
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Kurzak
View author publications
You can also search for this author in PubMed Google Scholar
Julien Langou
View author publications
You can also search for this author in PubMed Google Scholar
Lee Rosenberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Engenharia da, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465, Porto, Portugal
José M. Laginha M. Palma
INP (ENSEEIHT) IRIT, University of Toulouse, rue Charles-Camichel, CEDEX 7, 31071, Toulouse, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, USA
Osni Marques
Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., Rosenberg, L. (2011). Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-19328-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19327-9
Online ISBN: 978-3-642-19328-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Experiments with Sparse Cholesky Using a Parametrized Task Graph Implementation

Exploiting Data Sparsity for Large-Scale Matrix Computations

Multithreaded Multifrontal Sparse Cholesky Factorization Using Threading Building Blocks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Experiments with Sparse Cholesky Using a Parametrized Task Graph Implementation

Exploiting Data Sparsity for Large-Scale Matrix Computations

Multithreaded Multifrontal Sparse Cholesky Factorization Using Threading Building Blocks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation