Abstract
We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the programmer. Exper- imental results for the Cholesky factorization of band matrices on two parallel platforms with sixteen processors demonstrate the scalability of the solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users Guide. SIAM, Philadelphia (1992)
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: A programming model for the Cell BE architecture. In: SC 2006: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p. 86. ACM Press, New York (2006)
Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algo- rithms. ACM Transactions on Mathematical Software 31(1), 1–26 (2005)
Bientinesi, P., Quintana-Ortí, E.S., van de Geijn, R.A.: Repre- senting linear algebra algorithms in code: The FLAME application programming interfaces. ACM Trans. Math. Soft. 31(1), 27–59 (2005)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of par- allel tiled linear algebra algorithms for multicore architectures. LAPACK Working Note 191 UT-CS-07-600, University of Tennessee (September 2007)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel tiled qr factorization for multicore architectures. LAPACK Working Note 190 UT-CS- 07-598, University of Tennessee (July 2007)
Chan, E., Quintana-Ortí, E., Quintana-Ortí, G., van de Geijn, R.: SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In: SPAA 2007: Proceedings of the Nineteenth ACM Sym- posium on Parallelism in Algorithms and Architectures, pp. 116–126 (2007)
Chan, E., Van Zee, F.G., Bientinesi, P., Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.: SuperMatrix: A multithreaded runtime scheduling system for algorithms-by-blocks. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN symposium on Principles and practices of parallel programming. ACM Press, New York (to appear, 2008)
Chan, E., Van Zee, F., van de Geijn, R., Quintana-Ortí, E.S., Quintana-Ortí, G.: Satisfying your dependencies with SuperMatrix. In: IEEE Cluster 2007, pp. 92–99 (2007)
Chatterjee, S., Lebeck, A.R., Patnala, P.K., Thottethodi, M.: Recursive array layouts and fast matrix multiplication. IEEE Trans. on Parallel and Distributed Systems 13(11), 1105–1123 (2002)
Dongarra, J.J., Duff, I.S., Sorensen, D.C., van der Vorst, H.A.: Solving Linear Systems on Vector and Shared Memory Computers. SIAM, Philadelphia (1991)
Elmroth, E., Gustavson, F., Gustavson, F., Jonsson, I., Kagstrom, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1), 3–45 (2004)
Henry, G.: BLAS based on block data structures. Theory Center Technical Report CTC92TR89, Cornell University (February 1992)
Leiserson, C., Plaat, A.: Programming parallel applications in Cilk. SIAM News, SINEWS (1998)
Low, T.M., van de Geijn, R.: An API for manipulating matrices stored by blocks. Technical Report TR-2004-15, Department of Computer Sciences, The University of Texas at Austin (May 2004)
Park, N., Hong, B., Prasanna, V.K.: Tiling, block data layout, and memory hier- archy performance. IEEE Trans. on Parallel and Distributed Systems 14(7), 640–654 (2003)
Quintana-Ortí, G., Quintana-Ortí, E., Chan, E., Van Zee, F.G., van de Geijn, R.: Scheduling of QR factorization algorithms on SMP and multi-core architectures. In: 16th Euromicro Int. Conference on Parallel, Dis- tributed and Network-based Processing. IEEE, Los Alamitos (to appear, 2008)
Quintana-Ortí, G., Quintana-Ortí, E.S., Chan, E., van de Geijn, R., Van Zee, F.G.: Design and scheduling of an algorithm-by-blocks for the LU factorization on multithreaded architectures. FLAME Working Note #26 TR-07-50, The University of Texas at Austin, Department of Computer Sciences (September 2007)
Quintana-Ortí, G., Quintana-Ortí, E.S., Remón, A., van de Geijn, R.: SuperMatrix for the factorization of band matrices. FLAMEWorking Note #27 TR-07-51, The University of Texas at Austin, Department of Computer Sciences (September 2007)
Remón, A., Quintana-Ortí, E.S., Quintana-Ortí, G.: Cholesky factorization of band matrices using multithreaded BLAS. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 608–616. Springer, Heidelberg (to appear, 2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Quintana-Ortí, G., Quintana-Ortí, E.S., Remón, A., van de Geijn, R.A. (2008). An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2008. VECPAR 2008. Lecture Notes in Computer Science, vol 5336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92859-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-92859-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92858-4
Online ISBN: 978-3-540-92859-1
eBook Packages: Computer ScienceComputer Science (R0)