[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Design, implementation, and application of GPU-based Java bytecode interpreters

Published: 10 October 2019 Publication History

Abstract

We present the design and implementation of GVM, the first system for executing Java bytecode entirely on GPUs. GVM is ideal for applications that execute a large number of short-living tasks, which share a significant fraction of their codebase and have similar execution time. GVM uses novel algorithms, scheduling, and data layout techniques to adapt to the massively parallel programming and execution model of GPUs. We apply GVM to generate and execute tests for Java projects. First, we implement a sequence-based test generation on top of GVM and design novel algorithms to avoid redundant test sequences. Second, we use GVM to execute randomly generated test cases. We evaluate GVM by comparing it with two existing Java bytecode interpreters (Oracle JVM and Java Pathfinder), as well as with the Oracle JVM with just-in-time (JIT) compiler, which has been engineered and optimized for over twenty years. Our evaluation shows that sequence-based test generation on GVM outperforms both Java Pathfinder and Oracle JVM interpreter. Additionally, our results show that GVM performs as well as running our parallel sequence-based test generation algorithm using JVM with JIT with many CPU threads. Furthermore, our evaluation on several classes from open-source projects shows that executing randomly generated tests on GVM outperforms sequential execution on JVM interpreter and JVM with JIT.

References

[1]
Shoaib Akram, Jennifer B Sartor, Kenzo Van Craeynest, Wim Heirman, and Lieven Eeckhout. 2016. Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors. Transactions on Architecture and Code Optimization 13, 1 (2016), 4.
[2]
Amazon. 2018. Amazon EC2 Elastic GPUs. https://aws.amazon.com/ec2/elastic-gpus/ .
[3]
Saswat Anand, Corina S. Păsăreanu, and Willem Visser. 2007. JPF-SE: A Symbolic Execution Extension to Java Pathfinder. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 134–138.
[4]
Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-transparent Support for Multiple Page Sizes. In International Symposium on Microarchitecture . 136–150.
[5]
Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J. Rossbach, and Onur Mutlu. 2018. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency. In International Conference on Architectural Support for Programming Languages and Operating Systems. 503–518.
[6]
Azure. 2018. Azure Windows VM sizes - GPU. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu .
[7]
Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, and Elnar Hajiyev. 2015. Pencil: A Platform-Neutral Compute Intermediate Language for Accelerator Programming. In International Conference on Parallel Architecture and Compilation. 138–149.
[8]
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In International Conference on High Performance Computing, Networking, Storage and Analysis. 66:1–66:11.
[9]
João Bispo, Luís Reis, and João M. P. Cardoso. 2015. C and OpenCL Generation from MATLAB. In Symposium on Applied Computing . 1315–1320.
[10]
David Blythe. 2006. The Direct3D 10 system. ACM Trans. Graph. 25, 3 (2006), 724–734.
[11]
Denis Bogdanas and Grigore Roşu. 2015. K-Java: A Complete Semantics of Java. In Symposium on Principles of Programming Languages . 445–456.
[12]
K.J. Brown, A.K. Sujeeth, H.J. Lee, T. Rompf, H. Chafi, and K. Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In International Conference on Parallel Architectures and Compilation Techniques. 89–100.
[13]
Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2010. Copperhead: Compiling an Embedded Data Parallel Language. Technical Report UCB/EECS-2010-124. EECS Department, University of California, Berkeley. http://www.eecs.berkeley. edu/Pubs/TechRpts/2010/EECS-2010-124.html
[14]
Ahmet Celik, Sreepathi Pai, Sarfraz Khurshid, and Milos Gligoric. 2017. Bounded Exhaustive Test-Input Generation on GPUs. In Conference on Object-Oriented Programming, Systems, Languages, and Applications. 94:1–94:25.
[15]
Manuel MT Chakravarty, Gabriele Keller, Sean Lee, Trevor L McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Workshop on Declarative Aspects of Multicore Programming. 3–14.
[16]
Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. In International Conference for High Performance Computing, Networking, Storage and Analysis . 9:1–9:12.
[17]
Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers). In Conference on Programming Language Design and Implementation . 1–12.
[18]
Chucky Ellison and Grigore Roşu. 2012. An Executable Formal Semantics of C with Applications. In Symposium on Principles of Programming Languages . 533–544.
[19]
Naila Farooqui, Christopher J. Rossbach, Yuan Yu, and Karsten Schwan. 2014. Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications. In International Conference on Timely Results in Operating Systems. 5–5.
[20]
Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In Symposium on Principles of Programming Languages . 174–186.
[21]
Google. 2018. Graphics Processing Unit (GPU): Leverage GPUs on Google Cloud for Machine Learning and Scientific Computing. https://cloud.google.com/gpu/ .
[22]
Java GPU. 2019. Java GPU Code Archive. https://code.google.com/archive/p/java-gpu .
[23]
Kate Gregory and Ade Miller. 2012. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++. Microsoft Press.
[24]
Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools . 124–134.
[25]
Amir Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric M. Rabbah, Trevor N. Mudge, and Scott A. Mahlke. 2009. Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures. In International Conference on Parallel Architectures and Compilation Techniques . 214–223.
[26]
Java Pathfinder. 2019. Java Pathfinder Home Page. https://github.com/javapathfinder/jpf-core .
[27]
JCuda. 2018. Java Bindings for CUDA. https://www.jcuda.org/jcuda/JCuda.html .
[28]
Jikes RVM. 2019. Jikes RVM Home Page. https://www.jikesrvm.org .
[29]
Andrew Kerr, Gregory F. Diamos, and Sudhakar Yalamanchili. 2009. A Characterization and Analysis of PTX Kernels. In International Symposium on Workload Characterization . 3–12.
[30]
Khronos. 2019. OpenCL Overview. https://www.khronos.org/opencl .
[31]
Chang Hwan Peter Kim, Darko Marinov, Sarfraz Khurshid, Don Batory, Sabrina Souto, Paulo Barros, and Marcelo d’Amorim. 2013. SPLat: Lightweight Dynamic Analysis for Reducing Combinatorics in Testing Configurable Systems. 257–267.
[32]
James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394.
[33]
Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GPU Run-time Code Generation. Parallel Comput. 38, 3 (2012), 157–174.
[34]
Ivan Kuraj, Viktor Kuncak, and Daniel Jackson. 2015. Programming with Enumerable Sets of Structures. In Conference on Object-Oriented Programming, Systems, Languages, and Applications . 37–56.
[35]
Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. In International Symposium on Computer Architecuture . 451–460.
[36]
Martin Maas, Philip Reames, Jeffrey Morlan, Krste Asanović, Anthony D. Joseph, and John Kubiatowicz. 2012. GPUs as an Opportunity for Offloading Garbage Collection. In International Symposium on Memory Management. 25–36.
[37]
Madanlal Musuvathi and Shaz Qadeer. 2007. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In Conference on Programming Language Design and Implementation. 446–455.
[38]
Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. 2014. Exploring Variability-aware Execution for Testing Plugin-based Web Applications. In International Conference on Software Engineering. 907–918.
[39]
NVIDIA. 2019. CUDA Zone. https://developer.nvidia.com/cuda-zone .
[40]
Oracle. 2019a. Java SE at a Glance. https://www.oracle.com/technetwork/java/javase/overview/index.html .
[41]
Oracle. 2019b. JEP 318: Epsilon: A No-Op Garbage Collector. https://openjdk.java.net/jeps/318 .
[42]
Oracle. 2019c. JNI APIs and Developer Guides. https://docs.oracle.com/javase/8/docs/technotes/guides/jni .
[43]
Oracle. 2019d. OpenJDK Project Sumatra. http://openjdk.java.net/projects/sumatra .
[44]
Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In International Conference on Software Engineering. 75–84.
[45]
Shoumik Palkar, James J. Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimarjan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman P. Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proceedings of the VLDB Endowment 11, 9 (2018), 1002–1015.
[46]
Jonathan Power, Mark D Hill, and David A Wood. 2014. Supporting x86-64 Address Translation for 100s of GPU Lanes. In International Symposium on High Performance Computer Architecture . 568–578.
[47]
Michael Pradel and Thomas R. Gross. 2012. Fully Automatic and Precise Detection of Thread Safety Violations. In Conference on Programming Language Design and Implementation . 521–530.
[48]
Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic Compilation of MATLAB Programs for Synergistic Execution on Heterogeneous Processors. In Conference on Programming Language Design and Implementation. 152–163.
[49]
Philip C. Pratt-Szeliga, James W. Fawcett, and Roy D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In International Conference on High Performance Computing and Communication . 375–380.
[50]
Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2017. Halide: Decoupling Algorithms from Schedules for High-performance Image Processing. Commun. ACM 61, 1 (2017), 106–115.
[51]
Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011. PTask: Operating System Abstractions to Manage GPUs As Compute Devices. In Symposium on Operating Systems Principles. 233–248.
[52]
Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: a Compiler and Runtime for Heterogeneous Systems. In Symposium on Operating Systems Principles. 49–68.
[53]
Rohan Sharma, Milos Gligoric, Andrea Arcuri, Gordon Fraser, and Darko Marinov. 2011. Testing Container Classes: Random or Systematic? 262–277.
[54]
William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction . 179–196.
[55]
Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, Ayal Zaks, Gil Rapaport, Abhinav Gaba, Vasileios Porpodas, and Eric Garcia. 2017. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization. In Workshop on the LLVM Compiler Infrastructure in HPC . 4.
[56]
Jan Vesely, Arkaprava Basu, Mark Oskin, Gabriel H. Loh, and Abhishek Bhattacharjee. 2016. Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems. In International Symposium on Performance Analysis of Systems and Software . 161–171.
[57]
Willem Visser, Klaus Havelund, Guillaume Brat, Seungjoon Park, and Flavio Lerda. 2003. Model Checking Programs. Automated Software Engineering 10, 2 (2003), 203–232.
[58]
Willem Visser, Corina S. Pˇasˇareanu, and Radek Pelánek. 2006. Test Input Generation for Java Containers Using State Matching. In International Symposium on Software Testing and Analysis. 37–48.
[59]
Mikhail Vorontsov. 2019. Java Performance Tuning Guide. http://java-performance.info/over-32g-heap-java .
[60]
Yaron Weinsberg, Danny Dolev, Tal Anker, Muli Ben-Yehuda, and Pete Wyckoff. 2008. Tapping into the Fountain of CPUs: On Operating System Support for Programmable Devices. In International Conference on Architectural Support for Programming Languages and Operating Systems . 179–188.
[61]
Anton Wijs and Dragan Bošnački. 2014. GPUexplore: Many-core On-the-fly State Space Exploration Using GPUs. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems . 233–247.
[62]
Anton Wijs, Thomas Neele, and Dragan Bošnački. 2016. GPUexplore 2.0: Unleashing GPU Explicit-state Model Checking. In International Symposium on Formal Methods. 694–701.
[63]
Yonghong Yan, Max Grossman, and Vivek Sarkar. 2009. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In European Conference on Parallel Processing. 887–899.
[64]
Vanya Yaneva, Ajitha Rajan, and Christophe Dubach. 2017. Compiler-assisted Test Acceleration on GPUs for Embedded Software. In International Symposium on Software Testing and Analysis. 35–45.
[65]
Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Workshop on General Purpose Processing with Graphics Processing Units. 74–83.

Cited By

View all
  • (2024)JVM optimization: An empirical analysis of JVM configurations for enhanced web application performanceSoftwareX10.1016/j.softx.2024.10193328(101933)Online publication date: Dec-2024
  • (2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 3, Issue OOPSLA
October 2019
2077 pages
EISSN:2475-1421
DOI:10.1145/3366395
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2019
Published in PACMPL Volume 3, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Complete matching
  2. Graphics Processing Unit
  3. Java bytecode interpreter
  4. Sequence-based test generation
  5. Shape matching

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)153
  • Downloads (Last 6 weeks)13
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)JVM optimization: An empirical analysis of JVM configurations for enhanced web application performanceSoftwareX10.1016/j.softx.2024.10193328(101933)Online publication date: Dec-2024
  • (2020)Running parallel bytecode interpreters on heterogeneous hardwareCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397563(31-35)Online publication date: 23-Mar-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media