More Web Proxy on the site http://driver.im/

research-article

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters

Authors:

Jaejin LeeAuthors Info & Claims

ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

Pages 341 - 352

https://doi.org/10.1145/2304576.2304623

Published: 25 June 2012 Publication History

Abstract

In this paper, we propose SnuCL, an OpenCL framework for heterogeneous CPU/GPU clusters. We show that the original OpenCL semantics naturally fits to the heterogeneous cluster programming environment, and the framework achieves high performance and ease of programming. The target cluster architecture consists of a designated, single host node and many compute nodes. They are connected by an interconnection network, such as Gigabit Ethernet and InfiniBand switches. Each compute node is equipped with multicore CPUs and multiple GPUs. A set of CPU cores or each GPU becomes an OpenCL compute device. The host node executes the host program in an OpenCL application. SnuCL provides a system image running a single operating system instance for heterogeneous CPU/GPU clusters to the user. It allows the application to utilize compute devices in a compute node as if they were in the host node. No communication API, such as the MPI library, is required in the application source. SnuCL also provides collective communication extensions to OpenCL to facilitate manipulating memory objects. With SnuCL, an OpenCL application becomes portable not only between heterogeneous devices in a single node, but also between compute devices in the cluster environment. We implement SnuCL and evaluate its performance using eleven OpenCL benchmark applications.

References

[1]

AMD. AMD Accelerated Parallel Processing (APP) SDK, 2011. http://developer.amd.com/sdks/amdappsdk/pages/default.aspx.

[2]

C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstations. Computer, 29:18--28, February 1996.

Digital Library

[3]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT'08, pages 72--81, 2008.

Digital Library

[4]

J. Duato, A. Pena, F. Silla, R. Mayo, and E. Quintana-Orti. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of the International Conference on High Performance Computing and Simulation, HPCS'11, pages 224--231, 28 2010-july 2 2010.

[5]

J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B. R. Gaster, and B. Zheng. Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT'10, pages 205--216, 2010.

Digital Library

[6]

IBM. OpenCL Development Kit for Linux on Power, 2011. http://www.alphaworks.ibm.com/tech/opencl.

[7]

Intel. Intel Composer XE 2011 for Linux. http://software.intel.com/en-us/articles/intel-composer-xe.

[8]

Intel. Intel OpenCL SDK, 2011. http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/.

[9]

Khronos OpenCL Working Group. The OpenCL Specification Version 1.1, 2010. http://www.khronos.org/opencl.

[10]

J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in OpenCL for multiple GPUs. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP'11, pages 277--288, 2011.

Digital Library

[11]

J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. OpenCL as a Programming Model for GPU Clusters. In Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing, LCPC'11, 2011.

[12]

D. B. Kirk and W.-m. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2010.

Digital Library

[13]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO'04, pages 75--86, 2004.

Digital Library

[14]

J. Lee, J. Kim, J. Kim, S. Seo, and J. Lee. An OpenCL Framework for Homogeneous Manycores with no Hardware Cache Coherence. In Proceedings of the 20th international conference on Parallel architectures and compilation techniques, PACT'11, 2011.

Digital Library

[15]

J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee, S. M. Cho, H. J. Song, S.-B. Suh, and J.-D. Choi. An OpenCL framework for heterogeneous multicores with local memory. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT'10, pages 193--204, 2010.

Digital Library

[16]

V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA'10, pages 451--460, 2010.

Digital Library

[17]

H. Li, S. Tandri, M. Stumm, and K. C. Sevcik. Locality and Loop Scheduling on NUMA Multiprocessors. In Proceedings of the 1993 International Conference on Parallel Processing - Volume 02, ICPP'93, pages 140--147, 1993.

Digital Library

[18]

NASA Advanced Supercomputing Division. NAS Parallel Benchmarks version 3.3. http://www.nas.nasa.gov/Resources/Software/npb.html.

[19]

NVIDIA. NVIDIA CUDA Toolkit 4.0. http://developer.nvidia.com/cuda-toolkit-40.

[20]

NVIDIA. NVIDIA OpenCL, 2011. http://developer.nvidia.com/opencl.

[21]

S. Seo, G. Jo, and J. Lee. Performance Characterization of the NAS Parallel Benchmarks in OpenCL. In Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC'11, pages 137--148, 2011.

Digital Library

[22]

Seoul National University and Samsung. SNU-SAMSUNG OpenCL Framework, 2010. http://opencl.snu.ac.kr.

[23]

M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI-The Complete Reference, Volume 1: The MPI Core. MIT Press, Cambridge, MA, USA, 2nd. (revised) edition, 1998.

Digital Library

[24]

B. Steensgaard. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL'96, pages 32--41, 1996.

Digital Library

[25]

The IMPACT Research Group. Parboil Benchmark suite. http://impact.crhc.illinois.edu/parboil.php.

Cited By

Kemmler SRettinger CRüde UCuéllar PKöstler H(2025)Efficiency and scalability of fully-resolved fluid-particle simulations on heterogeneous CPU-GPU architecturesThe International Journal of High Performance Computing Applications10.1177/10943420241313385Online publication date: 10-Jan-2025
https://doi.org/10.1177/10943420241313385
Khetawat HMueller F(2024)Workload Scheduling on Heterogeneous DevicesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528933(1-11)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528933
Kim JLee SJohnston BVetter J(2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3429010
Show More Cited By

Index Terms

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
      2. Source code generation
    2. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

OpenCL as a unified programming model for heterogeneous CPU/GPU clusters
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

In this paper, we propose an OpenCL framework for heterogeneous CPU/GPU clusters, and show that the framework achieves both high performance and ease of programming. The framework provides an illusion of a single system for the user. It allows the ...
A distributed OpenCL framework using redundant computation and data replication
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other ...
OpenCL as a unified programming model for heterogeneous CPU/GPU clusters
PPOPP '12

In this paper, we propose an OpenCL framework for heterogeneous CPU/GPU clusters, and show that the framework achieves both high performance and ease of programming. The framework provides an illusion of a single system for the user. It allows the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

June 2012

400 pages

ISBN:9781450313162

DOI:10.1145/2304576

General Chairs:
Utpal Banerjee
University of California at Irvine, USA
,
Kyle A. Gallivan
Florida State University, USA
,
Program Chairs:
Gianfranco Bilardi
Università degli Studi di Padova, Italy
,
Manolis G.H. Katevenis
FORTH and University of Crete, Greece

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS'12

Sponsor:

SIGARCH

ICS'12: International Conference on Supercomputing

June 25 - 29, 2012

San Servolo Island, Venice, Italy

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

129
Total Citations
View Citations
2,170
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)6

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kemmler SRettinger CRüde UCuéllar PKöstler H(2025)Efficiency and scalability of fully-resolved fluid-particle simulations on heterogeneous CPU-GPU architecturesThe International Journal of High Performance Computing Applications10.1177/10943420241313385Online publication date: 10-Jan-2025
https://doi.org/10.1177/10943420241313385
Khetawat HMueller F(2024)Workload Scheduling on Heterogeneous DevicesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528933(1-11)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528933
Kim JLee SJohnston BVetter J(2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3429010
Suluhan HGener SFusco AMack JDagli IBelviranli MEdemen CAkoglu A(2024)A Runtime Manager Integrated Emulation Environment for Heterogeneous SoC Design with RISC-V Cores2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00013(23-30)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00013
Alaei MYazdanpanah F(2024)A Survey on Heterogeneous CPU–GPU Architectures and SimulatorsConcurrency and Computation: Practice and Experience10.1002/cpe.831837:1Online publication date: 30-Oct-2024
https://doi.org/10.1002/cpe.8318
Mack JHassan SKumbhare NCastro Gonzalez MAkoglu A(2023)CEDR: A Compiler-integrated, Extensible DSSoC RuntimeACM Transactions on Embedded Computing Systems10.1145/352925722:2(1-34)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3529257
Alves RRufino J(2023)Remote Execution of OpenCL and SYCL Applications via rOpenCL2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00020(51-60)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00020
Yoo JOh KJun JCho HKim K(2023)Removing Host Interventions from GPU Accelerated Neural Network2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043523(1-2)Online publication date: 6-Jan-2023
https://doi.org/10.1109/ICCE56470.2023.10043523
Salzmann PKnorr FThoman PGschwandtner PCosenza BFahringer T(2023)An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00018(82-93)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00018
Tang TLu KPeng LCui YFang JHuang CWang RYang CGuo Y(2023)SNCL: a supernode OpenCL implementation for hybrid computing arraysThe Journal of Supercomputing10.1007/s11227-023-05766-380:7(9471-9493)Online publication date: 8-Dec-2023
https://doi.org/10.1007/s11227-023-05766-3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents