Abstract
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
BLAS: Basic linear algebra subprograms, http://www.netlib.org/blas/
Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H.: PLASMA Users’ Guide. Technical report, ICL, UTK (2009)
Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC 2009: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. ACM, New York (2009)
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2001)
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.W., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)
Bientinesi, P., Gunter, B., van de Geijn, R.: Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw. 35(1), 1–22 (2008)
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper. 20(13), 1573–1590 (2008)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 35(1), 38–53 (2009)
Chan, E.: Runtime data flow scheduling of matrix computations. FLAME Working Note #39. Technical Report TR-09-22, The University of Texas at Austin, Department of Computer Sciences (August 2009)
Chan, E., Van Zee, F.G., Bientinesi, P., Quintana-Ortí, E.S., Quintana-Ortí, G., van de Geijn, R.: Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 123–132. ACM, New York (2008)
Christofides, N.: Graph Theory: An algorithmic Approach (1975)
Du Croz, J.J., Higham, N.J.: Stability of methods for matrix inversion. IMA Journal of Numerical Analysis 12, 1–19 (1992)
Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks®. IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Kurzak, J., Dongarra, J.: Fully dynamic scheduler for numerical computing on multicore processors. University of Tennessee CS Tech. Report, UT-CS-09-643 (2009)
Kurzak, J., Dongarra, J.: QR factorization for the Cell Broadband Engine. Sci. Program. 17(1-2), 31–42 (2009)
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of IEEE Cluster Computing 2008 (2008)
Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software 36(3)
Rinard, M.C., Scales, D.J., Lam, M.S.: Jade: A high-level, machine-independent language for parallel programming. Computer 6, 28–38 (1993)
Sutter, H.: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (2005)
Wolfe, M.: Doany: Not just another parallel loop. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 421–433. Springer, Heidelberg (1993)
Van Zee, F.G.: libflame: The Complete Reference (2009), http://www.lulu.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., Rosenberg, L. (2011). Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-19328-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19327-9
Online ISBN: 978-3-642-19328-6
eBook Packages: Computer ScienceComputer Science (R0)