[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3173162.3173200acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Skyway: Connecting Managed Heaps in Distributed Big Data Systems

Published: 19 March 2018 Publication History

Abstract

Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.

References

[1]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006 a. Group Formation in Large Social Networks: Membership, Growth, and Evolution KDD. 44--54.
[2]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006 b. Group Formation in Large Social Networks: Membership, Growth, and Evolution KDD. 44--54.
[3]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques WWW. 595--601.
[4]
Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing ICDE. 1151--1162.
[5]
Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A Bloat-Aware Design for Big Data Applications. In ISMM. 119--130.
[6]
Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. Vol. 1, 2 (2008), 1265--1276.
[7]
Jeff Chase, Miche Baker-Harvey, Hank Levy, and Ed Lazowska. 1992. Opal: A Single Address Space System for 64-bit Architectures. SIGOPS Oper. Syst. Rev. Vol. 26, 2 (1992), 9.
[8]
Colfer. 2017. The Colfer Serializer. https://go.libhunt.com/project/colfer. (2017).
[9]
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In NSDI. 21--21.
[10]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters OSDI. 137--150.
[11]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113.
[12]
Izzat El Hajj, Alexander Merritt, Gerd Zellweger, Dejan Milojicic, Reto Achermann, Paolo Faraboschi, Wen-mei Hwu, Timothy Roscoe, and Karsten Schwan. 2016. SpaceJMP: Programming with Multiple Virtual Address Spaces ASPLOS. 353--368.
[13]
Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs. In SOSP. 394--409.
[14]
Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping Out Garbage Collection from Big Data Systems HotOS.
[15]
Google. 2017. Orkut social network. http://snap.stanford.edu/data/com-Orkut.html. (2017).
[16]
Steven M. Hand. 1999. Self-paging in the Nemesis Operating System. In OSDI. 73--86.
[17]
UC Irvine. 2014. Hyracks: A data parallel platform. http://code.google.com/p/hyracks/. (2014).
[18]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks EuroSys. 59--72.
[19]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media? WWW. 591--600.
[20]
A. Lindstrom, J. Rosenberg, and A. Dearle. 1995. The Grand Unified Theory of Address Spaces. In HotOS. 66--71.
[21]
Martin Maas, Tim Harris, Krste Asanović, and John Kubiatowicz. 2015. Trash Day: Coordinating Garbage Collection in Distributed Systems HotOS.
[22]
Martin Maas, Tim Harris, Krste Asanović, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In ASPLOS. 457--471.
[23]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-tolerant Software Distributed Shared Memory USENIX ATC. 291--305.
[24]
Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-Performance Big-Data-Friendly Garbage Collector OSDI. 349--365.
[25]
Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. textscFacade: A compiler and runtime for (almost) object-bounded big data applications. In ASPLOS. 675--690.
[26]
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig Latin: a not-so-foreign language for data processing SIGMOD. 1099--1110.
[27]
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. Vol. 13, 4 (2005), 277--298.
[28]
Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. 1987. Machine-independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. In ASPLOS. 31--39.
[29]
Eishay Smith. 2017. The Java Serialization Benchmark Set. https://github.com/eishay/jvm-serializers. (2017).
[30]
Masahiko Takahashi, Kenji Kono, and Takashi Masuda. 1999. Efficient Kernel Support of Fine-Grained Protection Domains for Mobile Code ICDCS. 64--73.
[31]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. Vol. 2, 2 (2009), 1626--1629.
[32]
TPC. 2014. The standard data warehousing benchmark. http://www.tpc.org/tpch. (2014).
[33]
Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of `small-world' networks. Nature Vol. 393, 6684 (1998), 440--442.
[34]
Michal Wegiel and Chandra Krintz. 2008. XMem: Type-safe, Transparent, Shared Memory for Cross-runtime Communication and Coordination. In PLDI. 327--338.
[35]
Java World. 2017. The Java serialization algorithm revealed. http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html. (2017).
[36]
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: simplified relational data processing on large clusters SIGMOD. 1029--1040.
[37]
Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed Aggregation for Data-parallel Computing: Interfaces and Implementations. In SOSP. 247--260.
[38]
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI. 1--14.
[39]
Matei Zaharia. 2016. What is changing in Big Data? https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/Zaharia_Matei_Big_Data.pdf. (2016). MSR Faculty Summit.
[40]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In HotCloud.

Cited By

View all
  • (2024)TeraHeap: Exploiting Flash Storage for Mitigating DRAM Pressure in Managed Big Data FrameworksACM Transactions on Programming Languages and Systems10.1145/370059346:4(1-37)Online publication date: 15-Oct-2024
  • (2024)Serialization/Deserialization-free State Transfer in Serverless WorkflowsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629568(132-147)Online publication date: 22-Apr-2024
  • (2023)Cornflakes: Zero-Copy Serialization for Microsecond-Scale NetworkingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613137(200-215)Online publication date: 23-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 2
    ASPLOS '18
    February 2018
    809 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3296957
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. data transfer
  3. distributed systems
  4. serialization and deserialization

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '18

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)241
  • Downloads (Last 6 weeks)27
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TeraHeap: Exploiting Flash Storage for Mitigating DRAM Pressure in Managed Big Data FrameworksACM Transactions on Programming Languages and Systems10.1145/370059346:4(1-37)Online publication date: 15-Oct-2024
  • (2024)Serialization/Deserialization-free State Transfer in Serverless WorkflowsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629568(132-147)Online publication date: 22-Apr-2024
  • (2023)Cornflakes: Zero-Copy Serialization for Microsecond-Scale NetworkingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613137(200-215)Online publication date: 23-Oct-2023
  • (2023)TeraHeap: Reducing Memory Pressure in Managed Big Data FrameworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582045(694-709)Online publication date: 25-Mar-2023
  • (2023)High-Performance Object Serialization based on Ahead-of-Time Schema Generation2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00335(2378-2385)Online publication date: 1-Nov-2023
  • (2023)Accelerating Multilingual Applications with In-memory Array Sharing2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386462(255-262)Online publication date: 15-Dec-2023
  • (2022)Improving Concurrent GC for Latency Critical Services in Multi-tenant SystemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3531515(43-55)Online publication date: 7-Nov-2022
  • (2022)Transparent and lightweight object placement for managed workloads atop hybrid memoriesProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516822(72-80)Online publication date: 25-Feb-2022
  • (2022)Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile MemoryACM Transactions on Architecture and Code Optimization10.1145/351170619:2(1-26)Online publication date: 24-Mar-2022
  • (2022)MetallParallel Computing10.1016/j.parco.2022.102905111:COnline publication date: 1-Jul-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media