[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2500828.2500840acmconferencesArticle/Chapter ViewAbstractPublication PagespppjConference Proceedingsconference-collections
research-article

Accelerating Habanero-Java programs with OpenCL generation

Published: 11 September 2013 Publication History

Abstract

The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are characterized by a challenging learning curve for non-experts due to their complex and low-level APIs. Looking to the future, improving the accessibility of GPUs and accelerators for mainstream software developers is crucial to bringing the benefits of these heterogeneous architectures to a broader set of application domains. A key challenge in doing so is that mainstream developers are accustomed to working with high-level managed languages, such as Java, rather than lower-level native languages such as C, CUDA, and OpenCL.
The OpenCL standard enables portable execution of SIMD kernels across a wide range of platforms, including multi-core CPUs, many-core GPUs, and FPGAs. However, using OpenCL from Java to program multi-architecture systems is difficult and error-prone. Programmers are required to explicitly perform a number of low-level operations, such as (1) managing data transfers between the host system and the GPU, (2) writing kernels in the OpenCL kernel language, (3) compiling kernels & performing other OpenCL initialization, and (4) using the Java Native Interface (JNI) to access the C/C++ APIs for OpenCL.
In this paper, we present compile-time and run-time techniques for accelerating programs written in Java using automatic generation of OpenCL as a foundation. Our approach includes (1) automatic generation of OpenCL kernels and JNI glue code from a parallel-for construct (forall) available in the Habanero-Java (HJ) language, (2) leveraging HJ's array view language construct to efficiently support rectangular, multi-dimensional arrays on OpenCL devices, and (3) implementing HJ's phaser (next) construct for all-to-all barrier synchronization in automatically generated OpenCL kernels. Our approach is named HJ-OpenCL. Contrasting with past approaches to generating CUDA or OpenCL from high-level languages, the HJ-OpenCL approach helps the programmer preserve Java exception semantics by generating both exception-safe and unsafe code regions. The execution of one or the other is selected at runtime based on the safe language construct introduced in this paper.
We use a set of ten Java benchmarks to evaluate our approach, and observe performance improvements due to both native OpenCL execution and parallelism. On an AMD APU, our results show speedups of up to 36.7× relative to sequential Java when executing on the host 4-core CPU, and of up to 55.0x on the integrated GPU. For a system with an Intel Xeon CPU and a discrete NVIDIA Fermi GPU, the speedups relative to sequential Java are 35.7× for the 12-core CPU and 324.0× for the GPU. Further, we find that different applications perform optimally in JVM execution, in OpenCL CPU execution, and in OpenCL GPU execution. The language features, compiler extensions, and runtime extensions included in this work enable portability, rapid prototyping, and transparent execution of JVM applications across all OpenCL platforms.

References

[1]
APARAPI. API for Data Parallel Java. http://code.google.com/p/aparapi/.
[2]
Pedro V. Artigas, Manish Gupta, Samuel P. Midkiff, and José E. Moreira. Automatic loop transformations and parallelization for java. In Proceedings of the 14th international conference on Supercomputing, ICS '00, pages 1--10, New York, NY, USA, 2000. ACM.
[3]
Rastislav Bodík, Rajiv Gupta, and Vivek Sarkar. Abcd: eliminating array bounds checks on demand. SIGPLAN Not., 35(5):321--333, May 2000.
[4]
Vincent Cavé et al. Habanero-Java: the New Adventures of Old X10. In PPPJ'11: Proceedings of 9th International Conference on the Principles and Practice of Programming in Java, 2011.
[5]
Satish Chandra et al. Type inference for locality analysis of distributed data structures. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 11--22, New York, NY, USA, 2008. ACM.
[6]
Chapel. The Chapel language specification version 0.4, February 2005.
[7]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM.
[8]
Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. Compiling a high-level language for gpus: (via language support for architectures and compilers). In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 1--12, New York, NY, USA, 2012. ACM.
[9]
OpenACC Directives for accelerators. Openacc.http://www.openaccstandard.org/.
[10]
Yi Guo et al. Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism. In IPDPS '09: International Parallel and Distributed Processing Symposium, 2009.
[11]
Amir H. Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke. Sponge: portable stream programming on graphics engines. SIGPLAN Not., 46(3):381--392, March 2011.
[12]
JGF. The Java Grande Forum benchmark suite. http://www.epcc.ed.ac.uk/javagrande/javag.html.
[13]
Mackale Joyner, Zoran Budimlić, and Vivek Sarkar. Subregion analysis and bounds check elimination for high level arrays. In Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, CC'11/ETAPS'11, pages 246--265, Berlin, Heidelberg, 2011. Springer-Verlag.
[14]
Khronos OpenCL Working Group. The OpenCL Specification v1.2. 2012.
[15]
khronos.org. Opencl. http://www.khronos.org/opencl/.
[16]
Roberto Lublinerman et al. Delegated Isolation. In OOPSLA '11: Proceeding of the 26th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2011.
[17]
José E. Moreira, Samuel P. Midkiff, and Manish Gupta. From flop to megaflops: Java for technical computing. ACM Trans. Program. Lang. Syst., 22(2):265--295, March 2000.
[18]
Nvidia. NVidia CUDA Programming Guide version 1.0. http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf, 2007.
[19]
Nathaniel Nystrom, Derek White, and Kishen Das. Firepile: run-time compilation for gpus in scala. SIGPLAN Not., 47(3):107--116, October 2011.
[20]
Parboil. Parboil benchmarks. http://impact.crhc.illinois.edu/parboil.aspx.
[21]
Tim Peierls, Brian Goetz, Joshua Bloch, Joseph Bowbeer, Doug Lea, and David Holmes. Java concurrency in practice. Addison-Wesley Professional, 2005.
[22]
PolyBench. The polyhedral benchmark suite. http://www.cse.ohio-state.edu/pouchet/software/polybench.
[23]
P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using gpus from java. In High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 375--380, June.
[24]
Jeffery Von Ronne, Andreas Gampe, David Niedzielski, and Kleanthis Psarris. Safe bounds check annotations. In Concurrency and Computations: Practice and Experience, Vol. 21, No. 1, 2009.
[25]
Sean Ross-Ross. Clyther: a python just-in-time specialialization engine for OpenCL.
[26]
Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of the 22nd annual international conference on Supercomputing, ICS '08, pages 277--288, New York, NY, USA, 2008. ACM.
[27]
William Thies, Michal Karczmarek, and Saman P. Amarasinghe. Streamit: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, CC '02, pages 179--196, London, UK, UK, 2002. Springer-Verlag.
[28]
Thomas Würthinger, Christian Wimmer, and Hanspeter Mössenböck. Array bounds check elimination for the java hotspot client compiler. In Proceedings of the 5th international symposium on Principles and practice of programming in Java, PPPJ '07, pages 125--133, New York, NY, USA, 2007. ACM.

Cited By

View all
  • (2022)Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processingProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516821(58-71)Online publication date: 25-Feb-2022
  • (2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
  • (2019)Design, implementation, and application of GPU-based Java bytecode interpretersProceedings of the ACM on Programming Languages10.1145/33606033:OOPSLA(1-28)Online publication date: 10-Oct-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPPJ '13: Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
September 2013
188 pages
ISBN:9781450321112
DOI:10.1145/2500828
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. Habanero-Java
  3. Java
  4. OpenCL

Qualifiers

  • Research-article

Funding Sources

Conference

PPPJ '13
PPPJ '13: virtual machines, languages, and tools
September 11 - 13, 2013
Stuttgart, Germany

Acceptance Rates

Overall Acceptance Rate 29 of 58 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processingProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516821(58-71)Online publication date: 25-Feb-2022
  • (2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
  • (2019)Design, implementation, and application of GPU-based Java bytecode interpretersProceedings of the ACM on Programming Languages10.1145/33606033:OOPSLA(1-28)Online publication date: 10-Oct-2019
  • (2019)Dynamic application reconfiguration on heterogeneous hardwareProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313819(165-178)Online publication date: 14-Apr-2019
  • (2018)Exploiting high-performance heterogeneous hardware for Java programs using graalProceedings of the 15th International Conference on Managed Languages & Runtimes10.1145/3237009.3237016(1-13)Online publication date: 12-Sep-2018
  • (2018)Towards practical heterogeneous virtual machinesCompanion Proceedings of the 2nd International Conference on the Art, Science, and Engineering of Programming10.1145/3191697.3191730(46-48)Online publication date: 9-Apr-2018
  • (2018)Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java ProgramsAccelerator Programming Using Directives10.1007/978-3-319-74896-2_7(125-144)Online publication date: 31-Jan-2018
  • (2017)Rubus: A compiler for seamless and extensible parallelismPLOS ONE10.1371/journal.pone.018872112:12(e0188721)Online publication date: 6-Dec-2017
  • (2017)Heterogeneous Managed Runtime SystemsACM SIGPLAN Notices10.1145/3140607.305076452:7(74-82)Online publication date: 8-Apr-2017
  • (2017)Heterogeneous Managed Runtime SystemsProceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3050748.3050764(74-82)Online publication date: 8-Apr-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media