[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3341069.3341072acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture

Published: 22 June 2019 Publication History

Abstract

SpMV is the core algorithm in solving the sparse linear equations, which is widely used in many research and engineering application field. GPU is the most common coprocessor in high-performance computing domain, and has already been proven to researchers the practical value in accelerating various algorithms. A lot of reletead work has been carried out to optimize parallel SpMV on CPU-GPU platforms, which mainly focuses on reducing the computing overhead on the GPU, including branch divergence and cache missing, and little attention was paid to the overall efficiency of the heterogeneous platform. In this paper, we describe the design and implementation of an adaptive sparse matrix-vector multiplication (SpMV) on CPU-GPU heterogeneous architecture. We propose a dynamic task scheduling framework for CPU-GPU platform to improve the utilization of both CPU and GPU. A double buffering scheme is also presented to hide the data transfer overhead between CPU and GPU. Two deeply optimized SpMV kernels are deployed for CPU and GPU respectively. The evaluation on typical sparse matrices indicates that the proposed algorithm obtains both significant performance increase and adaptability to different types of sparse matrices.

References

[1]
N. Bell and M. Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proc. SC09. ACM, 1--11.
[2]
Gropp, William D., et al. Using MPI: portable parallel programming with the message-passing interface. Vol. 1. MIT press, 1999.
[3]
Dagum, Leonardo, and Ramesh Menon. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering. 1998, 1: 46--55.
[4]
Sanders, Jason, and Edward Kandrot. CUDA by example: an introduction to general-purpose GPU programming, portable documents. Addison-Wesley Professional, 2010.
[5]
D'Azevedo E F, Fahey M R, Mills R T. Vectorized sparse matrix multiply for compressed row storage format. International Conference on Computational Science. Springer, Berlin, Heidelberg, 2005: 99--106.
[6]
Im, Eun-Jin, and Katherine A. Yelick. Optimizing the performance of sparse matrix-vector multiplication. University of California, Berkeley, 2000.
[7]
Vuduc, Richard Wilson, and James W. Demmel. Automatic performance tuning of sparse matrix kernels. Vol. 1. Berkeley, CA: University of California, Berkeley, 2003.
[8]
Pinar A, Heath M T. Improving performance of sparse matrix-vector multiplication. Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. IEEE, 1999: 30--30.
[9]
Williams S, Oliker L, Vuduc R, et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. IEEE, 2007: 1--12.
[10]
Im E J, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels. The International Journal of High Performance Computing Applications, 2004, 18(1): 135--158.
[11]
Vuduc R, Demmel J W, Yelick K A. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series. IOP Publishing, 2005, 16(1): 521.
[12]
Bolz J, Farmer I, Grinspun E, et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM transactions on graphics (TOG). ACM, 2003, 22(3): 917--924.
[13]
Sengupta S, Harris M, Zhang Y, et al. Scan primitives for GPU computing. Graphics hardware. 2007: 97--106.
[14]
Bell N, Garland M. Efficient sparse matrix-vector multiplication on CUDA. Nvidia Technical Report NVR-2008-004, Nvidia Corporation, 2008.
[15]
Cevahir A, Nukada A, Matsuoka S. Fast conjugate gradients with multiple GPUs. International Conference on Computational Science. Springer, Berlin, Heidelberg, 2009: 893--903.
[16]
Garland M. Sparse matrix computations on manycore GPU's. Proceedings of the 45th annual Design Automation Conference. ACM, 2008: 2--6.
[17]
Wang E, Zhang Q, Shen B, et al. Intel math kernel library. High-Performance Computing on the Intel Xeon Phi. Springer, Cham, 2014: 167--188.
[18]
Naumov, M., et al. Cusparse library. GPU Technology Conference. 2010.
[19]
Anzt H, Tomov S, Dongarra J. Energy efficiency and performance frontiers for sparse computations on GPU supercomputers. Proceedings of the sixth international workshop on programming models and applications for multicores and manycores. ACM, 2015: 1--10.
[20]
Davis T A, Hu Y. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 2011, 38(1): 1.

Cited By

View all
  • (2024)Elegante: A Machine Learning-Based Threads Configuration Tool for SpMV Computations on Shared Memory ArchitectureInformation10.3390/info1511068515:11(685)Online publication date: 1-Nov-2024
  • (2021)Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous SystemsIEEE Access10.1109/ACCESS.2021.30739559(59968-59978)Online publication date: 2021
  • (2020)Runtime Adaptive Matrix Multiplication for the SW26010 Many-Core ProcessorIEEE Access10.1109/ACCESS.2020.30193028(156915-156928)Online publication date: 2020
  • Show More Cited By

Index Terms

  1. Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference
    June 2019
    293 pages
    ISBN:9781450371858
    DOI:10.1145/3341069
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. Heterogeneous architecture
    3. Parallel Computing
    4. Sparse Matrix-Vector Multiplication

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HPCCT 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Elegante: A Machine Learning-Based Threads Configuration Tool for SpMV Computations on Shared Memory ArchitectureInformation10.3390/info1511068515:11(685)Online publication date: 1-Nov-2024
    • (2021)Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous SystemsIEEE Access10.1109/ACCESS.2021.30739559(59968-59978)Online publication date: 2021
    • (2020)Runtime Adaptive Matrix Multiplication for the SW26010 Many-Core ProcessorIEEE Access10.1109/ACCESS.2020.30193028(156915-156928)Online publication date: 2020
    • (2020)Hardware-Accelerated Platforms and Infrastructures for Network Functions: A Survey of Enabling Technologies and Research StudiesIEEE Access10.1109/ACCESS.2020.30082508(132021-132085)Online publication date: 2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media