Abstract
Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for any particular compute device. To develop efficient OpenCL applications for the particular platform, we still need a more profound understanding of architecture features on the OpenCL model and computing devices. For this purpose, we design and implement an OpenCL micro-benchmark suite for GPUs and CPUs. In this paper, we introduce the implementations of our OpenCL micro benchmarks, and present the measuring results of hardware and software features like performance of mathematical operations, bus bandwidths, memory architectures, branch synchronizations and scalability, etc., on two multi-core CPUs, i.e. AMD Athlon II X2 250 and Intel Pentium Dual-Core E5400, and two different GPUs, i.e. NVIDIA GeForce GTX 460se and AMD Radeon HD 6850. We also compared the measuring results with existing benchmarks to demonstrate the reasonableness and correctness of our benchmark suite.
Similar content being viewed by others
References
The OpenCL official site, at URL:http://www.khronos.org/opencl/
Seo S, Jo G, Lee J (2011) Performance characterization of the NAS Parallel Benchmarks in OpenCL. In: Proceedings of 2011 IEEE International Symposium on Workload Characterization (IISWC), IEEE, pp 137–148
Volkov V, Demmel JW (2008) Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, USA, p 31
Parboil Benchmark suite, at URL: http://impact.crhc.illinois.edu/parboil.php
Che S, Boyer M, Meng J et al (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of IEEE International Symposium on Workload Characterization 2009 (IISWC 2009), IEEE, pp 44–54
Torres Y, Gonzalez-Escribano A, Llanos DR (2013) uBench: exposing the impact of CUDA block geometry in terms of performance. J Supercomput 1–14
Shen J et al (2012) Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: Proceedings of 2012 41st international conference on parallel processing workshops (ICPPW), IEEE, pp 116–125
Danalis A, Marin G, McCurdy C et al (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, ACM, pp 63–74
The OpenCL 1.2 specification, at URL: http://www.khronos.org/registry/cl/specs/opencl-1.2
Torres Y, Gonzalez-Escribano A, Llanos DR (2011) Understanding the impact of CUDA tuning techniques for fermi. In: Proceedings of 2011 international conference on high performance computing and simulation (HPCS), IEEE
Helluy P (2011) A portable implementation of the radix sort algorithm in OpenCL, at URL: http://hal.archives-ouvertes.fr/hal-00596730, Technical Report
OpenCL Programming Guide Version 2.3. at URL: http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide
Peiyuan S, Xiaohua S (2012) An OpenCL approach of prestack Kirchhoff time migration algorithm on general purpose GPU. In: Proceedings of the 2012 13th international conference on parallel and distributed computing, applications and technologies, IEEE Computer Society
Wong H, Papadopoulou MM, Sadooghi-Alvandi M et al (2010) Demystifying GPU microarchitecture through microbenchmarking. In: Proceedings of 2010 IEEE international symposium on performance analysis of systems & software (ISPASS), IEEE, pp 235–246
Acknowledgments
This material is based upon works supported by National Natural Science Foundation of China No.61073010 and No.61272166, National Science and Technology Major Project of China No.2012ZX01039-004, and the State Key Laboratory of Software Development Environment of China No.SKLSDE-2012ZX-02.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yan, X., Shi, X., Wang, L. et al. An OpenCL micro-benchmark suite for GPUs and CPUs. J Supercomput 69, 693–713 (2014). https://doi.org/10.1007/s11227-014-1112-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1112-2