More Web Proxy on the site http://driver.im/

research-article

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Authors:

Joanna Kołodziej,

Christoph KesslerAuthors Info & Claims

ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Pages 1 - 6

https://doi.org/10.1145/3110355.3110356

Published: 28 July 2017 Publication History

Abstract

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

References

[1]

Erika Abraham, Costas Bekas, Ivona Brandic, Samir Genaim, Einar Broch Johnsen, Ivan Kondov, Sabri Pllana, and A. Achim Streit. 2015. Preparing HPC Applications for Exascale: Challenges and Recommendations. In 18th International Conference on Network-Based Information Systems (NBiS). 401--406. https://doi.org/10.1109/NBiS.2015.61

[2]

Siegfried Benkner, Sabri Pllana, Jesper Larsson Traff, Philippas Tsigas, Uwe Dolinsky, Cedric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney, and Vitaly Osipov. 2011. PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. Micro, IEEE 31, 5 (Sept 2011), 28--41. 0272--1732

[3]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC 2009. IEEE, 44--54.

Digital Library

[4]

George Chrysos. 2014. Intel® Xeon Phi? Coprocessor-the Architecture. Intel Whitepaper (2014).

[5]

Daniel Grzonka, Agnieszka Jakobik, Joanna Kolodziej, and Sabri Pllana. 2017. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Future Generation Computer Systems (2017). 0167--739X https://doi.org/10.1016/j.future.2017.05.046

[6]

Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W Hwu, et al\mbox. 2014. SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46--67.

[7]

Christoph Kessler, Usman Dastgeer, Samuel Thibault, Raymond Namyst, Andrew Richards, Uwe Dolinsky, Siegfried Benkner, Jesper Larsson Traff, and Sabri Pllana. 2012. Programmability and performance portability aspects of heterogeneous multi-/manycore systems. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE). 1403--1408. 1530--1591 https://doi.org/10.1109/DATE.2012.6176582

[8]

Xuechao Li, Po-Chou Shih, Jeffrey Overbey, Cheryl Seals, and Alvin Lim. 2016. Comparing programmer productivity in OpenACC and CUDA: an empirical investigation. International Journal of Computer Science, Engineering and Applications (IJCSEA) 6, 5 (2016), 1--15. https://doi.org/10.5121/ijcsea.2016.6501

[9]

Lu Li and Christoph Kessler. 2016. MeterPU: A Generic Measurement Abstraction API Enabling Energy-tuned Skeleton Backend Selection. Journal of Supercomputing (2016), 1--16. https://doi.org/10.1007/s11227-016--1792-x

Digital Library

[10]

Suejb Memeti and Sabri Pllana. 2015. Accelerating DNA Sequence Analysis Using Intel(R) Xeon Phi(TM). In 2015 IEEE Trustcom/BigDataSE/ISPA, Vol. 3. 222--227.

Digital Library

[11]

Sparsh Mittal and Jeffrey S Vetter. 2015. A survey of cpu-gpu heterogeneous computing techniques. ACM Computing Surveys (CSUR) 47, 4 (2015), 69.

Digital Library

[12]

NVIDIA. 2016. CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/. (September 2016). Accessed: 2017-03-06.

[13]

NVIDIA. 2017. What is GPU-Accelerated Computing? http://www.nvidia.com/object/what-is-gpu-computing.html. (April 2017). Accessed: 2017-04-03.

[14]

OpenMP. 2013. OpenMP 4.0 Specifications. http://www.openmp.org/specifications/. (July 2013). Accessed: 2017-03--10.

[15]

Rodinia. 2015. Rodinia:Accelerating Compute-Intensive Applications with Accelerators. (December 2015). http://www.cs.virginia.edu/skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Accelerators Last accessed: 10 April 2017.

[16]

SPEC. 2017. SPEC ACCEL: Read Me First. https://www.spec.org/accel/docs/readme1st.html#Q11. (February 2017). Accessed: 2017-04--10.

[17]

John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 1--3 (2010), 66--73.

[18]

Ching-Lung Su, Po-Yu Chen, Chun-Chieh Lan, Long-Sheng Huang, and Kuo-Hsuan Wu. 2012. Overview and comparison of OpenCL and CUDA technology for GPGPU. In 2012 IEEE Asia Pacific Conference on Circuits and Systems. 448--451. https://doi.org/10.1109/APCCAS.2012.6419068

[19]

Andre Viebke and Sabri Pllana. 2015. The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 758--765. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.45

Digital Library

[20]

Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First Experiences with Real-world Applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par'12). Springer-Verlag, Berlin, Heidelberg, 859--870. x978--3--642--32819-0

Digital Library

[21]

Yonghong Yan, Barbara M. Chapman, and Michael Wong. 2015. A comparison of heterogeneous and manycore programming models. https://goo.gl/81A4iV. (March 2015). Accessed: 2017-03--31.

Cited By

Cao LZhen ZChen SGastellu-Etchegorry JYin T(2025)Radiosity Graphics Model (RGM) at Pixel Scale for Simulation on Bidirectional Reflectance Factor (BRF) of Large-Scale Heterogeneous CanopyIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.351942963(1-14)Online publication date: 2025
https://doi.org/10.1109/TGRS.2024.3519429
Liang HDeng CZhang PFang JTang THuang C(2025)An empirical performance evaluation of SYCL on ARM multi-core processorsCCF Transactions on High Performance Computing10.1007/s42514-024-00212-z7:1(1-16)Online publication date: 14-Feb-2025
https://doi.org/10.1007/s42514-024-00212-z
Askar TYergaliyev AShukirgaliyev BAbdikamalov E(2024)Exploring Numba and CuPy for GPU-Accelerated Monte Carlo Radiation TransportComputation10.3390/computation1203006112:3(61)Online publication date: 20-Mar-2024
https://doi.org/10.3390/computation12030061
Show More Cited By

Index Terms

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Recommendations

A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware
IWOCL '22: Proceedings of the 10th International Workshop on OpenCL

In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational ...
Generating OpenCL C kernels from OpenACC
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

Hardware accelerators are now a common way to improve the performances of compute nodes. This performance improvement has a cost: applications need to be rewritten to take advantage of the new hardware. OpenACC is a set of compiler directives to target ...
CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application
CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

OpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

July 2017

38 pages

ISBN:9781450351164

DOI:10.1145/3110355

General Chairs:
Florin Pop
University Politehnica of Bucharest, Romania
,
Radu Prodan
University of Innsbruck, Austria
,
Marc Frincu
West University of Timisoara, Romania

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

EU Framework Programme Horizon 2020

Conference

PODC '17

Sponsor:

PODC '17: ACM Symposium on Principles of Distributed Computing

July 28, 2017

DC, Washington, USA

Acceptance Rates

ARMS-CC '17 Paper Acceptance Rate 4 of 11 submissions, 36%;

Overall Acceptance Rate 4 of 11 submissions, 36%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
1,042
Total Downloads

Downloads (Last 12 months)110
Downloads (Last 6 weeks)9

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao LZhen ZChen SGastellu-Etchegorry JYin T(2025)Radiosity Graphics Model (RGM) at Pixel Scale for Simulation on Bidirectional Reflectance Factor (BRF) of Large-Scale Heterogeneous CanopyIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.351942963(1-14)Online publication date: 2025
https://doi.org/10.1109/TGRS.2024.3519429
Liang HDeng CZhang PFang JTang THuang C(2025)An empirical performance evaluation of SYCL on ARM multi-core processorsCCF Transactions on High Performance Computing10.1007/s42514-024-00212-z7:1(1-16)Online publication date: 14-Feb-2025
https://doi.org/10.1007/s42514-024-00212-z
Askar TYergaliyev AShukirgaliyev BAbdikamalov E(2024)Exploring Numba and CuPy for GPU-Accelerated Monte Carlo Radiation TransportComputation10.3390/computation1203006112:3(61)Online publication date: 20-Mar-2024
https://doi.org/10.3390/computation12030061
Faqir-Rhazoui YCostero LGarcía C(2024)Balancing Energy Efficiency and Portability: Assessing Domain-Specific Languages in Edge Platforms2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT61029.2024.00032(162-169)Online publication date: 29-Apr-2024
https://doi.org/10.1109/DCOSS-IoT61029.2024.00032
Munera ARoyuela SPressler MMackamul HZiegenbein DQuiñones E(2024)Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMPJournal of Systems Architecture10.1016/j.sysarc.2023.103034146(103034)Online publication date: Jan-2024
https://doi.org/10.1016/j.sysarc.2023.103034
Buckland ENguyen Vde Vaucorbeil A(2024)Easily porting material point methods codes to GPUComputational Particle Mechanics10.1007/s40571-024-00768-111:5(2127-2142)Online publication date: 5-Jun-2024
https://doi.org/10.1007/s40571-024-00768-1
Gaschuk EEzhkova AOnoprienko VDebolskiy AMortikov E(2023)Passive Tracer Transport in Ocean Modeling: Implementation on GPUs, Efficiency and OptimizationsLobachevskii Journal of Mathematics10.1134/S199508022308015244:8(3040-3058)Online publication date: 28-Nov-2023
https://doi.org/10.1134/S1995080223080152
Gurumurthy BBroneske DDurand GPionteck TSaake G(2023)ADAMANT: A Query Executor with Plug-In Interfaces for Easy Co-processor Integration2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00093(1153-1166)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00093
Georgiev GLazarova M(2023)Programming Interfaces for Cross-platform Image Rendering and Deep Learning GPGPU2023 International Scientific Conference on Computer Science (COMSCI)10.1109/COMSCI59259.2023.10315835(1-4)Online publication date: 18-Sep-2023
https://doi.org/10.1109/COMSCI59259.2023.10315835
Memeti S(2023)Enabling Dynamic Selection of Implementation Variants in Component-Based Parallel Programming for Heterogeneous SystemsEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_17(219-231)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-50684-0_17
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten