[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3529538.3530216acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
poster

Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL

Published: 10 May 2022 Publication History

Abstract

With heterogeneous programming continuously on the rise, performance portability is still to be improved. SYCL provides the nd-range parallel-for paradigm for writing data-parallel kernels. This model allows barriers for group-local synchronization, similar to CUDA and OpenCL kernels. GPUs provide efficient means to model this, but on CPUs the necessary forward-progress guarantees require the use of many (lightweight) threads in library-only SYCL implementations, rendering the nd-range parallel-for unacceptably inefficient. By adopting two compiler-based approaches solving this, the present work improves the performance of the nd-range parallel-for in hipSYCL for CPUs by up to multiple orders of magnitude on various CPU architectures. The two alternatives are compared with regard to their functional correctness and performance. By upstreaming one of the variants, hipSYCL is the first SYCL implementation to provide a well performing nd-range parallel-for on CPU, without requiring an available OpenCL runtime.

Supplementary Material

a28-meyer-supplement (a28-meyer-supplement.pdf)
Poster

References

[1]
Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The Architecture, Current State and Future Direction of HipSYCL. In Proceedings of the International Workshop on OpenCL (Munich, Germany) (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 8, 1 pages. https://doi.org/10.1145/3388333.3388658
[2]
Aksel Alpay and Vincent Heuveline. 2021. HipSYCL in 2021: Peculiarities, Unique Features and SYCL 2020. In International Workshop on OpenCL (Munich, Germany) (IWOCL’21). Association for Computing Machinery, New York, NY, USA, Article 18, 1 pages. https://doi.org/10.1145/3456669.3456691
[3]
OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface. OpenMP Architecture Review Board. Version 5.0 November 2018.
[4]
Tom Deakin, Simon N McIntosh-Smith, Aksel Alpay, and Vincent Heuveline. 2021. Benchmarking and Extending SYCL Hierarchical Parallelism. In Workshop on Hierarchical Parallelism for Exascale Computing. IEEE Computer Society, United States, 10–19.
[5]
Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2016. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models. In High Performance Computing, Michela Taufer, Bernd Mohr, and Julian M. Kunkel (Eds.). Springer International Publishing, Cham, 489–507.
[6]
The Khronos®OpenCL Working Group. 2021. The OpenCL™  Specification. Khronos® Group. Version 3.0.8, retrieved on 19.08.21 from https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf.
[7]
Pekka Jääskeläinen, Carlos Sánchez de La Lama, Erik Schnetter, Kalle Raiskila, Jarmo Takala, and Heikki Berg. 2015. pocl: A performance-portable OpenCL implementation. International Journal of Parallel Programming 43, 5(2015), 752–785.
[8]
Zheming Jin. 2021. HeCBench. Version ba8310c1, https://github.com/zjin-lcf/HeCBench, last accessed on 07.11.21.
[9]
Pekka O. Jäskeläinen, Carlos S. de La Lama, Pablo Huerta, and Jarmo H. Takala. 2010. OpenCL-based design methodology for application-specific processors. In 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. IEEE, Samos, Greece, 223–230. https://doi.org/10.1109/ICSAMOS.2010.5642061
[10]
David Kaeli, Perhaad Mistry, Dana Schaa, and Dong Ping Zhang. 2015. Chapter 8 - Dissecting OpenCL on a heterogeneous system. In Heterogeneous Computing with OpenCL 2.0, David Kaeli, Perhaad Mistry, Dana Schaa, and Dong Ping Zhang (Eds.). Morgan Kaufmann, Boston, 187–212. https://doi.org/10.1016/B978-0-12-801414-1.00008-9
[11]
Ralf Karrenberg and Sebastian Hack. 2012. Improving Performance of OpenCL on CPUs. In Compiler Construction, Michael O’Boyle (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–20.
[12]
Sohan Lal, Aksel Alpay, Philip Salzmann, Biagio Cosenza, Nicolai Stawinoga, Peter Thoman, Thomas Fahringer, and Vincent Heuveline. 2020. SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing. In Proceedings of the International Workshop on OpenCL (Munich, Germany) (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 10, 1 pages. https://doi.org/10.1145/3388333.3388669
[13]
Julian Rosemann, Simon Moll, and Sebastian Hack. 2021. An Abstract Interpretation for SPMD Divergence on Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL, Article 31 (Jan. 2021), 31 pages. https://doi.org/10.1145/3434312

Cited By

View all
  • (2023)Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine LibraryProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585369(1-12)Online publication date: 18-Apr-2023
  • (2023)One Pass to Bind Them: The First Single-Pass SYCL Compiler with Unified Code Representation Across BackendsProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585351(1-12)Online publication date: 18-Apr-2023
  • (2023)Implementation Techniques for SPMD Kernels on CPUsProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585342(1-12)Online publication date: 18-Apr-2023
  • Show More Cited By

Index Terms

  1. Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        IWOCL '22: Proceedings of the 10th International Workshop on OpenCL
        May 2022
        123 pages
        ISBN:9781450396585
        DOI:10.1145/3529538
        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 May 2022

        Check for updates

        Author Tags

        1. CPU
        2. LLVM
        3. OpenCL
        4. SYCL
        5. compilation
        6. heterogeneous
        7. nd-range
        8. performance portability

        Qualifiers

        • Poster
        • Research
        • Refereed limited

        Conference

        IWOCL'22
        IWOCL'22: International Workshop on OpenCL
        May 10 - 12, 2022
        Bristol, United Kingdom, United Kingdom

        Acceptance Rates

        Overall Acceptance Rate 84 of 152 submissions, 55%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)6
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 12 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine LibraryProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585369(1-12)Online publication date: 18-Apr-2023
        • (2023)One Pass to Bind Them: The First Single-Pass SYCL Compiler with Unified Code Representation Across BackendsProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585351(1-12)Online publication date: 18-Apr-2023
        • (2023)Implementation Techniques for SPMD Kernels on CPUsProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585342(1-12)Online publication date: 18-Apr-2023
        • (2023)Improving performance of SYCL applications on CPU architectures using LLVM‐directed compilation flowConcurrency and Computation: Practice and Experience10.1002/cpe.781035:27Online publication date: 30-May-2023
        • (2022)Heterogeneous Programming for the Homogeneous Majority2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC56579.2022.00006(1-13)Online publication date: Nov-2022

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media