More Web Proxy on the site http://driver.im/

research-article

Open access

Design, implementation, and application of GPU-based Java bytecode interpreters

Authors:

Christopher J. Rossbach,

Milos GligoricAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 3, Issue OOPSLA

Article No.: 177, Pages 1 - 28

https://doi.org/10.1145/3360603

Published: 10 October 2019 Publication History

Abstract

We present the design and implementation of GVM, the first system for executing Java bytecode entirely on GPUs. GVM is ideal for applications that execute a large number of short-living tasks, which share a significant fraction of their codebase and have similar execution time. GVM uses novel algorithms, scheduling, and data layout techniques to adapt to the massively parallel programming and execution model of GPUs. We apply GVM to generate and execute tests for Java projects. First, we implement a sequence-based test generation on top of GVM and design novel algorithms to avoid redundant test sequences. Second, we use GVM to execute randomly generated test cases. We evaluate GVM by comparing it with two existing Java bytecode interpreters (Oracle JVM and Java Pathfinder), as well as with the Oracle JVM with just-in-time (JIT) compiler, which has been engineered and optimized for over twenty years. Our evaluation shows that sequence-based test generation on GVM outperforms both Java Pathfinder and Oracle JVM interpreter. Additionally, our results show that GVM performs as well as running our parallel sequence-based test generation algorithm using JVM with JIT with many CPU threads. Furthermore, our evaluation on several classes from open-source projects shows that executing randomly generated tests on GVM outperforms sequential execution on JVM interpreter and JVM with JIT.

References

[1]

Shoaib Akram, Jennifer B Sartor, Kenzo Van Craeynest, Wim Heirman, and Lieven Eeckhout. 2016. Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors. Transactions on Architecture and Code Optimization 13, 1 (2016), 4.

[2]

Amazon. 2018. Amazon EC2 Elastic GPUs. https://aws.amazon.com/ec2/elastic-gpus/ .

[3]

Saswat Anand, Corina S. Păsăreanu, and Willem Visser. 2007. JPF-SE: A Symbolic Execution Extension to Java Pathfinder. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 134–138.

[4]

Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-transparent Support for Multiple Page Sizes. In International Symposium on Microarchitecture . 136–150.

[5]

Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J. Rossbach, and Onur Mutlu. 2018. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency. In International Conference on Architectural Support for Programming Languages and Operating Systems. 503–518.

Digital Library

[6]

Azure. 2018. Azure Windows VM sizes - GPU. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu .

[7]

Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, and Elnar Hajiyev. 2015. Pencil: A Platform-Neutral Compute Intermediate Language for Accelerator Programming. In International Conference on Parallel Architecture and Compilation. 138–149.

[8]

Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In International Conference on High Performance Computing, Networking, Storage and Analysis. 66:1–66:11.

[9]

João Bispo, Luís Reis, and João M. P. Cardoso. 2015. C and OpenCL Generation from MATLAB. In Symposium on Applied Computing . 1315–1320.

[10]

David Blythe. 2006. The Direct3D 10 system. ACM Trans. Graph. 25, 3 (2006), 724–734.

Digital Library

[11]

Denis Bogdanas and Grigore Roşu. 2015. K-Java: A Complete Semantics of Java. In Symposium on Principles of Programming Languages . 445–456.

Digital Library

[12]

K.J. Brown, A.K. Sujeeth, H.J. Lee, T. Rompf, H. Chafi, and K. Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In International Conference on Parallel Architectures and Compilation Techniques. 89–100.

[13]

Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2010. Copperhead: Compiling an Embedded Data Parallel Language. Technical Report UCB/EECS-2010-124. EECS Department, University of California, Berkeley. http://www.eecs.berkeley. edu/Pubs/TechRpts/2010/EECS-2010-124.html

[14]

Ahmet Celik, Sreepathi Pai, Sarfraz Khurshid, and Milos Gligoric. 2017. Bounded Exhaustive Test-Input Generation on GPUs. In Conference on Object-Oriented Programming, Systems, Languages, and Applications. 94:1–94:25.

[15]

Manuel MT Chakravarty, Gabriele Keller, Sean Lee, Trevor L McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Workshop on Declarative Aspects of Multicore Programming. 3–14.

[16]

Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. In International Conference for High Performance Computing, Networking, Storage and Analysis . 9:1–9:12.

[17]

Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers). In Conference on Programming Language Design and Implementation . 1–12.

[18]

Chucky Ellison and Grigore Roşu. 2012. An Executable Formal Semantics of C with Applications. In Symposium on Principles of Programming Languages . 533–544.

[19]

Naila Farooqui, Christopher J. Rossbach, Yuan Yu, and Karsten Schwan. 2014. Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications. In International Conference on Timely Results in Operating Systems. 5–5.

[20]

Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In Symposium on Principles of Programming Languages . 174–186.

[21]

Google. 2018. Graphics Processing Unit (GPU): Leverage GPUs on Google Cloud for Machine Learning and Scientific Computing. https://cloud.google.com/gpu/ .

[22]

Java GPU. 2019. Java GPU Code Archive. https://code.google.com/archive/p/java-gpu .

[23]

Kate Gregory and Ade Miller. 2012. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++. Microsoft Press.

[24]

Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools . 124–134.

[25]

Amir Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric M. Rabbah, Trevor N. Mudge, and Scott A. Mahlke. 2009. Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures. In International Conference on Parallel Architectures and Compilation Techniques . 214–223.

[26]

Java Pathfinder. 2019. Java Pathfinder Home Page. https://github.com/javapathfinder/jpf-core .

[27]

JCuda. 2018. Java Bindings for CUDA. https://www.jcuda.org/jcuda/JCuda.html .

[28]

Jikes RVM. 2019. Jikes RVM Home Page. https://www.jikesrvm.org .

[29]

Andrew Kerr, Gregory F. Diamos, and Sudhakar Yalamanchili. 2009. A Characterization and Analysis of PTX Kernels. In International Symposium on Workload Characterization . 3–12.

Digital Library

[30]

Khronos. 2019. OpenCL Overview. https://www.khronos.org/opencl .

[31]

Chang Hwan Peter Kim, Darko Marinov, Sarfraz Khurshid, Don Batory, Sabrina Souto, Paulo Barros, and Marcelo d’Amorim. 2013. SPLat: Lightweight Dynamic Analysis for Reducing Combinatorics in Testing Configurable Systems. 257–267.

[32]

James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394.

Digital Library

[33]

Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GPU Run-time Code Generation. Parallel Comput. 38, 3 (2012), 157–174.

Digital Library

[34]

Ivan Kuraj, Viktor Kuncak, and Daniel Jackson. 2015. Programming with Enumerable Sets of Structures. In Conference on Object-Oriented Programming, Systems, Languages, and Applications . 37–56.

[35]

Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. In International Symposium on Computer Architecuture . 451–460.

[36]

Martin Maas, Philip Reames, Jeffrey Morlan, Krste Asanović, Anthony D. Joseph, and John Kubiatowicz. 2012. GPUs as an Opportunity for Offloading Garbage Collection. In International Symposium on Memory Management. 25–36.

Digital Library

[37]

Madanlal Musuvathi and Shaz Qadeer. 2007. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In Conference on Programming Language Design and Implementation. 446–455.

[38]

Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. 2014. Exploring Variability-aware Execution for Testing Plugin-based Web Applications. In International Conference on Software Engineering. 907–918.

[39]

NVIDIA. 2019. CUDA Zone. https://developer.nvidia.com/cuda-zone .

[40]

Oracle. 2019a. Java SE at a Glance. https://www.oracle.com/technetwork/java/javase/overview/index.html .

[41]

Oracle. 2019b. JEP 318: Epsilon: A No-Op Garbage Collector. https://openjdk.java.net/jeps/318 .

[42]

Oracle. 2019c. JNI APIs and Developer Guides. https://docs.oracle.com/javase/8/docs/technotes/guides/jni .

[43]

Oracle. 2019d. OpenJDK Project Sumatra. http://openjdk.java.net/projects/sumatra .

[44]

Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In International Conference on Software Engineering. 75–84.

[45]

Shoumik Palkar, James J. Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimarjan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman P. Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proceedings of the VLDB Endowment 11, 9 (2018), 1002–1015.

Digital Library

[46]

Jonathan Power, Mark D Hill, and David A Wood. 2014. Supporting x86-64 Address Translation for 100s of GPU Lanes. In International Symposium on High Performance Computer Architecture . 568–578.

[47]

Michael Pradel and Thomas R. Gross. 2012. Fully Automatic and Precise Detection of Thread Safety Violations. In Conference on Programming Language Design and Implementation . 521–530.

[48]

Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic Compilation of MATLAB Programs for Synergistic Execution on Heterogeneous Processors. In Conference on Programming Language Design and Implementation. 152–163.

[49]

Philip C. Pratt-Szeliga, James W. Fawcett, and Roy D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In International Conference on High Performance Computing and Communication . 375–380.

[50]

Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2017. Halide: Decoupling Algorithms from Schedules for High-performance Image Processing. Commun. ACM 61, 1 (2017), 106–115.

Digital Library

[51]

Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011. PTask: Operating System Abstractions to Manage GPUs As Compute Devices. In Symposium on Operating Systems Principles. 233–248.

Digital Library

[52]

Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: a Compiler and Runtime for Heterogeneous Systems. In Symposium on Operating Systems Principles. 49–68.

Digital Library

[53]

Rohan Sharma, Milos Gligoric, Andrea Arcuri, Gordon Fraser, and Darko Marinov. 2011. Testing Container Classes: Random or Systematic? 262–277.

[54]

William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction . 179–196.

[55]

Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, Ayal Zaks, Gil Rapaport, Abhinav Gaba, Vasileios Porpodas, and Eric Garcia. 2017. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization. In Workshop on the LLVM Compiler Infrastructure in HPC . 4.

[56]

Jan Vesely, Arkaprava Basu, Mark Oskin, Gabriel H. Loh, and Abhishek Bhattacharjee. 2016. Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems. In International Symposium on Performance Analysis of Systems and Software . 161–171.

[57]

Willem Visser, Klaus Havelund, Guillaume Brat, Seungjoon Park, and Flavio Lerda. 2003. Model Checking Programs. Automated Software Engineering 10, 2 (2003), 203–232.

Digital Library

[58]

Willem Visser, Corina S. Pˇasˇareanu, and Radek Pelánek. 2006. Test Input Generation for Java Containers Using State Matching. In International Symposium on Software Testing and Analysis. 37–48.

[59]

Mikhail Vorontsov. 2019. Java Performance Tuning Guide. http://java-performance.info/over-32g-heap-java .

[60]

Yaron Weinsberg, Danny Dolev, Tal Anker, Muli Ben-Yehuda, and Pete Wyckoff. 2008. Tapping into the Fountain of CPUs: On Operating System Support for Programmable Devices. In International Conference on Architectural Support for Programming Languages and Operating Systems . 179–188.

Digital Library

[61]

Anton Wijs and Dragan Bošnački. 2014. GPUexplore: Many-core On-the-fly State Space Exploration Using GPUs. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems . 233–247.

[62]

Anton Wijs, Thomas Neele, and Dragan Bošnački. 2016. GPUexplore 2.0: Unleashing GPU Explicit-state Model Checking. In International Symposium on Formal Methods. 694–701.

[63]

Yonghong Yan, Max Grossman, and Vivek Sarkar. 2009. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In European Conference on Parallel Processing. 887–899.

[64]

Vanya Yaneva, Ajitha Rajan, and Christophe Dubach. 2017. Compiler-assisted Test Acceleration on GPUs for Embedded Software. In International Symposium on Software Testing and Analysis. 35–45.

[65]

Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Workshop on General Purpose Processing with Graphics Processing Units. 74–83.

Digital Library

Cited By

Noetzold Dde Moraes Rossetto ASilva LCrocker PLeithardt V(2024)JVM optimization: An empirical analysis of JVM configurations for enhanced web application performanceSoftwareX10.1016/j.softx.2024.10193328(101933)Online publication date: Dec-2024
https://doi.org/10.1016/j.softx.2024.101933
Fumero JStratikopoulos AKotselidis CAguiar AChiba SBoix E(2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3397537.3397563

Index Terms

Design, implementation, and application of GPU-based Java bytecode interpreters
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Compilers
      1. Interpreters
      2. Runtime environments
    2. General programming languages
      1. Language types
        Object oriented languages

Recommendations

Decompiling Java Bytecode: Problems, Traps and Pitfalls
CC '02: Proceedings of the 11th International Conference on Compiler Construction

Java virtual machines execute Java bytecode instructions. Since this bytecode is a higher level representation than traditional object code, it is possible to decompile it back to Java source. Many such decompilers have been developed and the ...
Java EE 5 Development using GlassFish Application Server: The complete guide to installing and configuring the GlassFish Application Server and developing ... 5 applications to be deployed to this server
Java EE 6 with GlassFish 3 Application Server

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 3, Issue OOPSLA

October 2019

2077 pages

EISSN:2475-1421

DOI:10.1145/3366395

Issue’s Table of Contents

Copyright © 2019 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2019

Published in PACMPL Volume 3, Issue OOPSLA

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Evaluated & Functional

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
855
Total Downloads

Downloads (Last 12 months)153
Downloads (Last 6 weeks)13

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Noetzold Dde Moraes Rossetto ASilva LCrocker PLeithardt V(2024)JVM optimization: An empirical analysis of JVM configurations for enhanced web application performanceSoftwareX10.1016/j.softx.2024.10193328(101933)Online publication date: Dec-2024
https://doi.org/10.1016/j.softx.2024.101933
Fumero JStratikopoulos AKotselidis CAguiar AChiba SBoix E(2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3397537.3397563

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents