[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication

Published: 06 June 2013 Publication History

Abstract

Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matrices with irregular sparsity patterns make it difficult to utilize cache locality effectively in SpMxV computations. In this work, we investigate single- and multiple-SpMxV frameworks for exploiting cache locality in SpMxV computations. For the single-SpMxV framework, we propose two cache-size--aware row/column reordering methods based on one-dimensional (1D) and two-dimensional (2D) top-down sparse matrix partitioning. We utilize the column-net hypergraph model for the 1D method and enhance the row-column-net hypergraph model for the 2D method. The primary aim in both of the proposed methods is to maximize the exploitation of temporal locality in accessing input vector entries. The multiple-SpMxV framework depends on splitting a given matrix into a sum of multiple nonzero-disjoint matrices. We propose a cache-size--aware splitting method based on 2D top-down sparse matrix partitioning by utilizing the row-column-net hypergraph model. The aim in this proposed method is to maximize the exploitation of temporal locality in accessing both input- and output-vector entries. We evaluate the validity of our models and methods on a wide range of sparse matrices using both cache-miss simulations and actual runs by using OSKI. Experimental results show that proposed methods and models outperform state-of-the-art schemes.

References

[1]
R. C. Agarwal, F. G. Gustavson, and M. Zubair, A high performance algorithm using pre-processing for the sparse matrix-vector multiplication, in Proceedings of Supercomputing '92 (Minneapolis, MN), IEEE, Washington, DC, 1992, pp. 32--41.
[2]
K. Akbudak, E. Kayaslan, and C. Aykanat, Hypergraph-Partitioning-Based Models and Methods for Exploiting Cache Locality in Sparse-Matrix Vector Multiplication, Technical report BU-CE-1201, Computer Engineering Department, Bilkent University, Ankara, Turkey, 2012; also available online at http://www.cs.bilkent.edu.tr/tech-reports/2012/BU-CE-1201.pdf.
[3]
I. Al-Furaih and S. Ranka, Memory hierarchy management for iterative graph structures, in IPPS/SPDP, IEEE, Washington, DC, 1998, pp. 298--302.
[4]
C. Aykanat, A. P\inar, and Ü. V. \CATLY, Permuting sparse rectangular matrices into block-diagonal form, SIAM J. Sci. Comput., 25 (2004), pp. 1860--1879.
[5]
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, 2000.
[6]
M. W. Berry, B. Hendrickson, and P. Raghavan, Sparse matrix reordering schemes for browsing hypertext, in The Mathematics of Numerical Analysis, Lectures Appl. Math. 32, AMS, Providence, RI, 1996, pp. 99--123.
[7]
Ü. V. Çatalyürek and C. Aykanat, Decomposing irregularly sparse matrices for parallel matrix-vector multiplications, in Proceedings of the 3rd International Symposium on Solving Irregularly Structured Problems in Parallel, Irregular '96, Lecture Notes in Comput. Sci. 1117, Springer-Verlag, Berlin, 1996, pp. 75--86.
[8]
Ü. V. Çatalyürek and C. Aykanat, Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multiplication, IEEE Trans. Parallel Distributed Systems, 10 (1999), pp. 673--693.
[9]
Ü. V. Çatalyürek and C. Aykanat, PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0, Department of Computer Engineering, Bilkent University, Ankara, Turkey; available online at http://bmi.osu.edu/$\sim$umit/software.htm, 1999.
[10]
Ü. V. Çatalyürek and C. Aykanat, A fine-grain hypergraph model for $2$D decomposition of sparse matrices, in Proceedings of the 15th International Parallel and Distributed Processing Symposium, IEEE, Washington, DC, p. 118.
[11]
Ü. V. Çatalyürek, C. Aykanat, and B. Uçar, On two-dimensional sparse matrix partitioning: Models, methods, and a recipe, SIAM J. Sci. Comput., 32 (2010), pp. 656--683.
[12]
J. M. Crummey, D. Whalley, and K. Kennedy, Improving memory hierarchy performance for irregular applications using data and computation reorderings, Internat. J. Parallel Programming, 29 (2001), pp. 217--247.
[13]
R. Das, D. J. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy, The design and implementation of a parallel unstructured Euler solver using software primitives, AIAA J., 1992 (1992), AIAA-92-0562.
[14]
T. A. Davis and Y. Hu, The University of Florida sparse matrix collection, ACM Trans. Math. Software, 38 (2011), p. 1.
[15]
C. Ding and K. Kennedy, Improving cache performance in dynamic applications through data and computation reorganization at run time, SIGPLAN Not., 34 (1999), pp. 229--241.
[16]
E. Elmroth, F. Gustavson, I. Jonsson, and B. K\aa gström, Recursive blocked algorithms and hybrid data structures for dense matrix library software, SIAM Rev., 46 (2004), pp. 3--45.
[17]
J. D. Frens and D. S. Wise, Auto-blocking matrix-multiplication or tracking blas$3$ performance from source code, SIGPLAN Not., 32 (1997), pp. 206--216.
[18]
G. Haase, M. Liebmann, and G. Plank, A Hilbert-order multiplication scheme for unstructured sparse matrices, Int. J. Parallel Emerg. Distrib. Syst., 22 (2007), pp. 213--220.
[19]
H. Han and C. Tseng, Exploiting locality for irregular scientific codes, IEEE Trans. Parallel Distributed Systems, 17 (2006), pp. 606--618.
[20]
S. A. Haque and S. Hossain, A note on the performance of sparse matrix-vector multiplication with column reordering, in Proceedings of the International Conference on Computing, Engineering and Information, IEEE, Washington, DC, 2009, pp. 23--26.
[21]
K. Helsgaun, An effective implementation of the Lin-Kernighan traveling salesman heuristic, European J. Oper. Res., 126 (2000), pp. 106--130.
[22]
B. Hendrickson and T. G. Kolda, Partitioning rectangular and structurally nonsymmetric sparse matrices for parallel processing, SIAM J. Sci. Comput., 21 (2000), pp. 2048--2072.
[23]
D. B. Heras, V. B. Pérez, J. C. Cabaleiro, and F. F. Rivera, Modeling and improving locality for the sparse-matrix-vector product on cache memories, Future Generation Comp. Syst., 18 (2001), pp. 55--67.
[24]
E. Im and K. Yelick, Optimizing Sparse Matrix Vector Multiplication on SMPs, in 9th SIAM Conference on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1999.
[25]
G. Jin and M. J. Crummey, Using space-filling curves for computation reordering, in Proceedings of the Los Alamos Computer Science Institute Sixth Annual Symposium (published on CD), Los Alamos National Labs, Sante Fe, NM, 2005.
[26]
G. Karypis, V. Kumar, R. Aggarwal, and S. Shekhar, hMeTiS: A Hypergraph Partitioning Package Version 1.0.1, Department of Computer Science and Engineering, Army HPC Research Center, University of Minnesota, Minneapolis, MN, 1998.
[27]
J. Koster, Parallel Templates for Numerical Linear Algebra, a High-Performance Computation Library, Master's thesis, Utrecht University, Utrecht, The Netherlands, 2002.
[28]
T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, Wiley--Teubner, Chichester, UK, 1990.
[29]
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley, Principles of runtime support for parallel processors, in Proceedings of the 2nd International Conference on Supercomputing, ICS '88, ACM, New York, 1988, pp. 140--152.
[30]
J. C. Pichel, D. B. Heras, J. C. Cabaleiro, and F. F. Rivera, Increasing data reuse of sparse algebra codes on simultaneous multithreading architectures, Concurrency Comput. Practice Experience, 21 (2009), pp. 1838--1856.
[31]
J. C. Pichel, D. B. Heras, J. C. Cabaleiro, and F. F. Rivera, Performance optimization of irregular codes based on the combination of reordering and blocking techniques, Parallel Comput., 31 (2005), pp. 858--876.
[32]
A. Pinar and M. T. Heath, Improving performance of sparse matrix-vector multiplication, in Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing '99, ACM, New York, 1999, p. 30.
[33]
Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM, Philadelphia, 2003.
[34]
M. M. Strout and P. D. Hovland, Metrics and models for reordering transformations, in Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP04) (Washington, DC), ACM, New York, 2004, pp. 23--34.
[35]
O. Temam and W. Jalby, Characterizing the behavior of sparse algorithms on caches, in Proceedings of Supercomputing '92 (Minneapolis, MN), IEEE, Washington, DC, 1992, pp. 578--587.
[36]
S. Toledo, Improving memory-system performance of sparse matrix-vector multiplication, IBM J. Research Development, 41 (1997), pp. 711--725.
[37]
B. Ucar and C. Aykanat, Partitioning sparse matrices for parallel preconditioned iterative methods, SIAM J. Sci. Comput., 29 (2007), pp. 1683--1709.
[38]
B. Vastenhouw and R. H. Bisseling, A two-dimensional data distribution method for parallel sparse matrix-vector multiplication, SIAM Rev., 47 (2005), pp. 67--95.
[39]
R. Vuduc, J. W. Demmel, and K. A. Yelick, OSKI: A library of automatically tuned sparse matrix kernels, J. Phys. Conference Series, 16 (2005), p. 521.
[40]
J. White and P. Sadayappan, On improving the performance of sparse matrix-vector multiplication, in Proceedings of the International Conference on High-Performance Computing, IEEE Computer Society, Los Alamitos, CA, 1997, pp. 578--587.
[41]
A. N. Yzelman and R. H. Bisseling, Cache-oblivious sparse matrix--vector multiplication by using sparse matrix partitioning methods, SIAM J. Sci. Comput., 31 (2009), pp. 3128--3154.

Cited By

View all
  • (2024)A Sparsity-Aware Distributed-Memory Algorithm for Sparse-Sparse Matrix MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00053(1-14)Online publication date: 17-Nov-2024
  • (2023)Efficient FPGA-Based Sparse Matrix–Vector Multiplication With Data Reuse-Aware CompressionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171542:12(4606-4617)Online publication date: 1-Dec-2023
  • (2022)Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569657(384-397)Online publication date: 8-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing  Volume 35, Issue 3
2013
922 pages
ISSN:1064-8275
DOI:10.1137/sjoce3.35.3
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 06 June 2013

Author Tags

  1. cache locality
  2. sparse matrix
  3. matrix-vector multiplication
  4. matrix reordering
  5. computational hypergraph model
  6. hypergraph partitioning
  7. traveling salesman problem

Author Tags

  1. 65F10
  2. 65F50
  3. 65Y20

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Sparsity-Aware Distributed-Memory Algorithm for Sparse-Sparse Matrix MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00053(1-14)Online publication date: 17-Nov-2024
  • (2023)Efficient FPGA-Based Sparse Matrix–Vector Multiplication With Data Reuse-Aware CompressionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171542:12(4606-4617)Online publication date: 1-Dec-2023
  • (2022)Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569657(384-397)Online publication date: 8-Oct-2022
  • (2021)Partitioning sparse deep neural networks for scalable training and inferenceProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460372(254-265)Online publication date: 3-Jun-2021
  • (2021)A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel SystemsThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432262(1-10)Online publication date: 20-Jan-2021

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media