[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2925426.2926278acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics

Published: 01 June 2016 Publication History

Abstract

Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel, used by a large class of numerical algorithms. Emerging big-data and machine learning applications are propelling a renewed interest in SpMV algorithms that can tackle massive amount of unstructured data---rapidly approaching the TeraByte range---with predictable, high performance. In this paper we describe a new methodology to design SpMV algorithms for shared memory multiprocessors (SMPs) that organizes the original SpMV algorithm into two distinct phases. In the first phase we build a scaled matrix, that is reduced in the second phase, providing numerous opportunities to exploit memory locality. Using this methodology, we have designed two algorithms. Our experiments on irregular big-data matrices (an order of magnitude larger than the current state of the art) show a quasi-optimal scaling on a large-scale POWER8 SMP system, with an average performance speedup of 3.8x, when compared to an equally optimized version of the CSR algorithm. In terms of absolute performance, with our implementation, the POWER8 SMP system is comparable to a 256-node cluster. In terms of size, it can process matrices with up to 68 billion edges, an order of magnitude larger than state-of-the-art clusters.

References

[1]
K. Akbudak, E. Kayaaslan, and C. Aykanat. Hypergraph partitioning based models and methods for exploiting cache locality in sparse matrix-vector multiplication. SIAM Journal on Scientific Computing, 35(3):C237--C262, 2013.
[2]
P. N. Q. Anh, R. Fan, and Y. Wen. Reducing vector i/o for faster gpu sparse matrix-vector multiplication. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pages 1043--1052, May 2015.
[3]
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 781--792, Piscataway, NJ, USA, 2014. IEEE Press.
[4]
N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proc. ACM/IEEE Conf. Supercomputing, SC '09, page 18. ACM, 2009.
[5]
E. G. Boman, K. D. Devine, and S. Rajamanickam. Scalable matrix computations on large scale-free graphs using 2d graph partitioning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, pages 50:1--50:12, New York, NY, USA, 2013. ACM.
[6]
A. Buluç, J. Fineman, M. Frigo, J. Gilbert, and C. Leiserson. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proc. 21th Annual Symposium Parallelism in Algorithms and Architectures, pages 233--244. ACM, 2009.
[7]
A. Buluç and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications. Int. J. High Perform. Comput. Appl., 25(4):496--509, Nov. 2011.
[8]
D. Buono, J. Gunnels, X. Que, F. Checconi, F. Petrini, T.-C. Tuan, and C. Long. Optimizing sparse linear algebra for large-scale graph analytics. Computer, 48(8):26--34, Aug 2015.
[9]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SIAM Data Mining, volume 4, pages 442--446. SIAM, 2004.
[10]
J. Choi, A. Singh, and R. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In ACM SIGPLAN Notices, volume 45, pages 115--126. ACM, 2010.
[11]
T. Davis. The University of Florida sparse matrix collection. In NA digest. Citeseer, 1994.
[12]
T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.
[13]
G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. J. Supercomput., 50:36--77, 2009.
[14]
E. Im, K. Yelick, and R. Vuduc. SPARSITY: Optimization framework for sparse matrix kernels. Intl J. High Perf. Comput. Appl., 18:135--158, 2004.
[15]
Intel Corporation. Math kernel library: http://software.intel.com/en-us/articles/intel-mkl.
[16]
A. Jain. pOSKI: An extensible autotuning framework to perform optimized SpMVs on multicore architectures. 2009.
[17]
S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Extrapolation methods for accelerating pagerank computations. In Proceedings of the 12th International Conference on World Wide Web, WWW '03, pages 261--270, New York, NY, USA, 2003. ACM.
[18]
J. Kepner and J. Gilbert. Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics, 2011.
[19]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, Sept. 1999.
[20]
T. G. Kolda, A. Pinar, T. Plantenga, and C. Seshadhri. A scalable generative graph model with community structure. SIAM Journal on Scientific Computing, 36(5):C424--C452, September 2014.
[21]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[22]
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 273--282, New York, NY, USA, 2013. ACM.
[23]
NVIDIA Corporation. cusparse library (included in cuda toolkit): https://developer.nvidia.com/cusparse.
[24]
J.-P. Onnela, J. SaramÃd'ki, J. HyvÃűnen, G. SzabÃş, D. Lazer, K. Kaski, J. KertÃl'sz, and A.-L. BarabÃąsi. Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences, 104(18):7332--7336, 2007.
[25]
Y. Saad. Sparskit: a basic tool kit for sparse matrix computations - version 2, 1994.
[26]
W. Starke, J. Stuecheli, D. Daly, J. Dodson, F. Auernhammer, P. Sagmeister, G. Guthrie, C. Marino, M. Siegel, and B. Blaner. The cache and memory subsystems of the ibm power8 processor. IBM Journal of Research and Development, 59(1):3:1--3:13, Jan 2015.
[27]
W. T. Tang, R. Zhao, M. Lu, Y. Liang, H. P. Huynh, X. Li, and R. S. M. Goh. Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on intel xeon phi. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '15, pages 136--145, Washington, DC, USA, 2015. IEEE Computer Society.
[28]
H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: Fast solutions and applications. Knowl. Inf. Syst., 14(3):327--346, Mar. 2008.
[29]
R. W. Vuduc. Automatic performance tuning of sparse matrix kernels. PhD thesis, Univ. of California, Berkeley, 2003.
[30]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. ACM/IEEE Conf. Supercomputing, SC '07, pages 38:1--38:12, New York, NY, USA, 2007. ACM.
[31]
A. Yoo, A. H. Baker, R. Pearce, and V. E. Henson. A scalable eigensolver for large scale-free graphs using 2d graph partitioning. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 63:1--63:11, New York, NY, USA, 2011. ACM.
[32]
A. N. Yzelman and R. H. Bisseling. Cache-oblivious sparse matrixâĂŞvector multiplication by using sparse matrix partitioning methods. SIAM Journal on Scientific Computing, 31(4):3128--3154, 2009.

Cited By

View all
  • (2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
  • (2024)Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor CoresProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673055(1135-1144)Online publication date: 12-Aug-2024
  • (2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data analytics
  2. Power8
  3. SpMV
  4. Sparse matrices
  5. Storage Formats

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICS '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)6
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
  • (2024)Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor CoresProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673055(1135-1144)Online publication date: 12-Aug-2024
  • (2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
  • (2024)Extending Sparse Patterns to Improve Inverse Preconditioning on GPU ArchitecturesProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658683(200-213)Online publication date: 3-Jun-2024
  • (2024)VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00011(14-25)Online publication date: 27-May-2024
  • (2024)Accelerating SpMV for Scale-Free Graphs with Optimized Bins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00190(2407-2420)Online publication date: 13-May-2024
  • (2024)Efficient SpMV for Graph Matrices Through Vectoring and CachingEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_25(356-370)Online publication date: 26-Aug-2024
  • (2023)HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication AcceleratorProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623790(1148-1162)Online publication date: 28-Oct-2023
  • (2023)Connectivity-Aware Link Analysis for Skewed GraphsProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605579(482-491)Online publication date: 7-Aug-2023
  • (2023)WISEProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577506(329-341)Online publication date: 25-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media