Abstract
General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages of type safety, managed runtimes and precise exception semantics. Current approaches to enabling GPGPU computing in Java and other managed languages involve low-level interfaces to native code that compromise the semantic guarantees of managed languages, and are not readily accessible to mainstream programmers.
In this paper, we propose compile-time and runtime technique for accelerating Java programs with automatic generation of OpenCL while preserving precise exception semantics. Our approach includes (1) automatic generation of OpenCL kernels and JNI glue code from a Java-based parallel-loop construct (forall), (2) speculative execution of OpenCL kernels on GPUs, and (3) automatic generation of optimized and parallel exception-checking code for execution on the CPU. A key insight in supporting our speculative execution is that the GPU’s device memory is separate from the CPU’s main memory, so that, in the case of a mis-speculation (exception), any side effects in a GPU kernel can be ignored by simply not communicating results back to the CPU.
We demonstrate the efficiency of our approach using eight Java benchmarks on two GPU-equipped platforms. Experimental results show that our approach can significantly accelerate certain classes of Java programs on GPUs while maintaining precise exception semantics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The OpenCL runtime has to wait for the completion of the kernel execution even in the event of an exception because there is no OpenCL API to terminate kernel on device currently.
- 2.
Theoretically this is unlikely, This is due to variation in timing.
References
APARAPI. API for Data Parallel Java. http://code.google.com/p/aparapi/
Artigas, P.V., et al.: Automatic loop transformations and parallelization for Java. In: Proceedings of the 14th International Conference on Supercomputing, ICS ’00, pp. 1–10. ACM, New York (2000)
Cavé, V., et al.: Habanero-Java: the new adventures of old X10. In: PPPJ’11: Proceedings of 9th International Conference on the Principles and Practice of Programming in Java (2011)
Android Developers. Renderscript. http://developer.android.com/guide/topics/renderscript/index.html
Ebcioğlu, K., Saraswat, V., Sarkar, V.: X10: programming for hierarchical parallelism and nonuniform data access (extended abstract). In: Language Runtimes ’04 Workshop: Impact of Next Generation Processor Architectures On Virtual Machines (Colocated with OOPSLA 2004), October 2004. www.aurorasoft.net/workshops/lar04/lar04home.htm
Hayashi, A., et al.: Accelerating Habanero-Java program with OpenCL generation. In: PPPJ’13: Proceedings of 10th International Conference on the Principles and Practice of Programming in Java (2013, under submission)
Dubach, C., et al.: Compiling a high-level language for GPUs: (via language support for architectures and compilers). In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 1–12. ACM, New York (2012)
Von Ronne, J., et al.: Safe bounds check annotations. Concurrency Computat. Pract. Exper. 21(1), 41–57 (2009)
Moreira, J.E., et al.: From flop to megaflops: Java for technical computing. ACM Trans. Program. Lang. Syst. 22(2), 265–295 (2000)
Shirako, J., et al.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS ’08, pp. 277–288. ACM, New York (2008)
Shirako, J., et al.: Phaser accumulators: a new reduction construct for dynamic parallelism. In: IPDPS 2009 (2009)
Samadi, M., et al.: Paragon: collaborative speculative loop execution on GPU and CPU. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 64–73. ACM, New York (2012)
Pratt-Szeliga, P.C., et al.: Rootbeer: seamlessly using GPUs from Java. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), June 2012, pp. 375–380 (2012)
Bodík, R., et al.: ABCD: eliminating array bounds checks on demand. SIGPLAN Not. 35(5), 321–333 (2000)
Chandra, S., et al.: Type inference for locality analysis of distributed data structures. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 11–22. ACM, New York (2008)
Würthinger, T., et al.: Array bounds check elimination for the Java HotSpot client compiler. In: Proceedings of the 5th International Symposium on Principles and Practice of Programming in Java, PPPJ ’07, pp. 125–133. ACM, New York (2007)
Yan, Y., Grossman, M., Sarkar, V.: JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009)
Fan, Z., et al.: GPU cluster for high performance computing. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC ’04, p. 47. IEEE Computer Society, Washington, DC (2004)
Guo, Y., et al.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS ’09: International Parallel and Distributed Processing Symposium (2009)
JGF. The Java Grande Forum benchmark suite. http://www.epcc.ed.ac.uk/javagrande/javag.html
Lublinerman, R., et al.: Delegated isolation. In: OOPSLA ’11: Proceeding of the 26th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (2011)
Manavski, S.A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinform. 9(Suppl 2), S10 (2008)
Parboil. Parboil benchmarks. http://impact.crhc.illinois.edu/parboil.aspx
PolyBench. The polyhedral benchmark suite. http://www.cse.ohio-state.edu/pouchet/software/polybench
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V. (2014). Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-09967-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)