[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/Trustcom.2015.612guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU

Published: 20 August 2015 Publication History

Abstract

Memory performance is of great importance to achieve high performance on the Nvidia CUDA GPU. Previous work has proposed specific optimizations such as thread coarsening, caching data in shared memory, and global data layout transformation. We argue that vectorization based on hyper loop parallelism can be used as a unified technique to optimize the memory performance. In this paper, we put forward a compiler framework based on the Cetus source-to-source compiler to improve the memory performance on the CUDA GPU by efficiently exploiting hyper loop parallelism in vectorization. We introduce abstractions of SIMD vectors and SIMD operations that match the execution model and memory model of the CUDA GPU, along with three different execution mapping strategies for efficiently offloading vectorized code to CUDA GPUs. In addition, as we employ the vectorization in C-to-CUDA with automatic parallelization, our technique further refines the mapping granularity between coarse-grain loop parallelism and GPU threads. We evaluated our proposed technique on two platforms, an embedded GPU system--Jetson TK1--and a desktop GPU--GeForce GTX 645. The experimental results demonstrate that our vectorization technique based on hyper loop parallelism can yield performance speedups up to 2.5 compared to the direct coarse-grain loop parallelism mapping.
  1. Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 03
    August 2015
    644 pages
    ISBN:9781467379526

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 20 August 2015

    Author Tags

    1. CUDA GPU
    2. hyper loop parallelism
    3. memory performance
    4. thread coarsening
    5. vectorization

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media