[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3110355.3110356acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Published: 28 July 2017 Publication History

Abstract

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

References

[1]
Erika Abraham, Costas Bekas, Ivona Brandic, Samir Genaim, Einar Broch Johnsen, Ivan Kondov, Sabri Pllana, and A. Achim Streit. 2015. Preparing HPC Applications for Exascale: Challenges and Recommendations. In 18th International Conference on Network-Based Information Systems (NBiS). 401--406. https://doi.org/10.1109/NBiS.2015.61
[2]
Siegfried Benkner, Sabri Pllana, Jesper Larsson Traff, Philippas Tsigas, Uwe Dolinsky, Cedric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney, and Vitaly Osipov. 2011. PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. Micro, IEEE 31, 5 (Sept 2011), 28--41. 0272--1732
[3]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC 2009. IEEE, 44--54.
[4]
George Chrysos. 2014. Intel® Xeon Phi? Coprocessor-the Architecture. Intel Whitepaper (2014).
[5]
Daniel Grzonka, Agnieszka Jakobik, Joanna Kolodziej, and Sabri Pllana. 2017. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Future Generation Computer Systems (2017). 0167--739X https://doi.org/10.1016/j.future.2017.05.046
[6]
Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W Hwu, et al\mbox. 2014. SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46--67.
[7]
Christoph Kessler, Usman Dastgeer, Samuel Thibault, Raymond Namyst, Andrew Richards, Uwe Dolinsky, Siegfried Benkner, Jesper Larsson Traff, and Sabri Pllana. 2012. Programmability and performance portability aspects of heterogeneous multi-/manycore systems. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE). 1403--1408. 1530--1591 https://doi.org/10.1109/DATE.2012.6176582
[8]
Xuechao Li, Po-Chou Shih, Jeffrey Overbey, Cheryl Seals, and Alvin Lim. 2016. Comparing programmer productivity in OpenACC and CUDA: an empirical investigation. International Journal of Computer Science, Engineering and Applications (IJCSEA) 6, 5 (2016), 1--15. https://doi.org/10.5121/ijcsea.2016.6501
[9]
Lu Li and Christoph Kessler. 2016. MeterPU: A Generic Measurement Abstraction API Enabling Energy-tuned Skeleton Backend Selection. Journal of Supercomputing (2016), 1--16. https://doi.org/10.1007/s11227-016--1792-x
[10]
Suejb Memeti and Sabri Pllana. 2015. Accelerating DNA Sequence Analysis Using Intel(R) Xeon Phi(TM). In 2015 IEEE Trustcom/BigDataSE/ISPA, Vol. 3. 222--227.
[11]
Sparsh Mittal and Jeffrey S Vetter. 2015. A survey of cpu-gpu heterogeneous computing techniques. ACM Computing Surveys (CSUR) 47, 4 (2015), 69.
[12]
NVIDIA. 2016. CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/. (September 2016). Accessed: 2017-03-06.
[13]
NVIDIA. 2017. What is GPU-Accelerated Computing? http://www.nvidia.com/object/what-is-gpu-computing.html. (April 2017). Accessed: 2017-04-03.
[14]
OpenMP. 2013. OpenMP 4.0 Specifications. http://www.openmp.org/specifications/. (July 2013). Accessed: 2017-03--10.
[15]
Rodinia. 2015. Rodinia:Accelerating Compute-Intensive Applications with Accelerators. (December 2015). http://www.cs.virginia.edu/skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Accelerators Last accessed: 10 April 2017.
[16]
SPEC. 2017. SPEC ACCEL: Read Me First. https://www.spec.org/accel/docs/readme1st.html#Q11. (February 2017). Accessed: 2017-04--10.
[17]
John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 1--3 (2010), 66--73.
[18]
Ching-Lung Su, Po-Yu Chen, Chun-Chieh Lan, Long-Sheng Huang, and Kuo-Hsuan Wu. 2012. Overview and comparison of OpenCL and CUDA technology for GPGPU. In 2012 IEEE Asia Pacific Conference on Circuits and Systems. 448--451. https://doi.org/10.1109/APCCAS.2012.6419068
[19]
Andre Viebke and Sabri Pllana. 2015. The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 758--765. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.45
[20]
Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First Experiences with Real-world Applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par'12). Springer-Verlag, Berlin, Heidelberg, 859--870. x978--3--642--32819-0
[21]
Yonghong Yan, Barbara M. Chapman, and Michael Wong. 2015. A comparison of heterogeneous and manycore programming models. https://goo.gl/81A4iV. (March 2015). Accessed: 2017-03--31.

Cited By

View all
  • (2025)Radiosity Graphics Model (RGM) at Pixel Scale for Simulation on Bidirectional Reflectance Factor (BRF) of Large-Scale Heterogeneous CanopyIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.351942963(1-14)Online publication date: 2025
  • (2025)An empirical performance evaluation of SYCL on ARM multi-core processorsCCF Transactions on High Performance Computing10.1007/s42514-024-00212-z7:1(1-16)Online publication date: 14-Feb-2025
  • (2024)Exploring Numba and CuPy for GPU-Accelerated Monte Carlo Radiation TransportComputation10.3390/computation1203006112:3(61)Online publication date: 20-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing
July 2017
38 pages
ISBN:9781450351164
DOI:10.1145/3110355
  • General Chairs:
  • Florin Pop,
  • Radu Prodan,
  • Marc Frincu
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. OpenACC
  3. OpenCL
  4. OpenMP
  5. energy consumption
  6. performance
  7. programming productivity

Qualifiers

  • Research-article

Funding Sources

  • EU Framework Programme Horizon 2020

Conference

PODC '17
Sponsor:

Acceptance Rates

ARMS-CC '17 Paper Acceptance Rate 4 of 11 submissions, 36%;
Overall Acceptance Rate 4 of 11 submissions, 36%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)9
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Radiosity Graphics Model (RGM) at Pixel Scale for Simulation on Bidirectional Reflectance Factor (BRF) of Large-Scale Heterogeneous CanopyIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.351942963(1-14)Online publication date: 2025
  • (2025)An empirical performance evaluation of SYCL on ARM multi-core processorsCCF Transactions on High Performance Computing10.1007/s42514-024-00212-z7:1(1-16)Online publication date: 14-Feb-2025
  • (2024)Exploring Numba and CuPy for GPU-Accelerated Monte Carlo Radiation TransportComputation10.3390/computation1203006112:3(61)Online publication date: 20-Mar-2024
  • (2024)Balancing Energy Efficiency and Portability: Assessing Domain-Specific Languages in Edge Platforms2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT61029.2024.00032(162-169)Online publication date: 29-Apr-2024
  • (2024)Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMPJournal of Systems Architecture10.1016/j.sysarc.2023.103034146(103034)Online publication date: Jan-2024
  • (2024)Easily porting material point methods codes to GPUComputational Particle Mechanics10.1007/s40571-024-00768-111:5(2127-2142)Online publication date: 5-Jun-2024
  • (2023)Passive Tracer Transport in Ocean Modeling: Implementation on GPUs, Efficiency and OptimizationsLobachevskii Journal of Mathematics10.1134/S199508022308015244:8(3040-3058)Online publication date: 28-Nov-2023
  • (2023)ADAMANT: A Query Executor with Plug-In Interfaces for Easy Co-processor Integration2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00093(1153-1166)Online publication date: Apr-2023
  • (2023)Programming Interfaces for Cross-platform Image Rendering and Deep Learning GPGPU2023 International Scientific Conference on Computer Science (COMSCI)10.1109/COMSCI59259.2023.10315835(1-4)Online publication date: 18-Sep-2023
  • (2023)Enabling Dynamic Selection of Implementation Variants in Component-Based Parallel Programming for Heterogeneous SystemsEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_17(219-231)Online publication date: 28-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media