[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3429357.3430522acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Primula: a Practical Shuffle/Sort Operator for Serverless Computing

Published: 21 December 2020 Publication History

Abstract

Serverless computing has recently gained much attention as a feasible alternative to always-on IaaS for data processing. However, existing severless frameworks are not (yet) usable enough to reach out to a large number of users. To wit, they still require developers to specify the number of serverless functions for a simple sort job. We report our experience in designing Primula, a serverless sort operator that abstracts away users from the complexities of resource provisioning, skewed data and stragglers, yielding the most accessible sort primitive to date. Our evaluation on the IBM Cloud platform demonstrates the usability of Primula without abandoning performance (e.g., 3x faster than a serverless Spark backend and 62% slower than a hybrid serverless/IaaS solution).

References

[1]
2019. EU H2020 CloudButton. https://cloudbutton.eu/.
[2]
2020. EU H2020 METASPACE. https://metaspace2020.eu/.
[3]
Theodore Alexandrov and Gil Vernik. 2020. Decoding Dark Molecular Matter in Spatial Metabolomics with IBM Cloud Functions. https://www.ibm.com/cloud/blog/decoding-dark-molecular-matter-in-spatial-metabolomics-with-ibm-cloud-functions.
[4]
Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the Outliers in Map-Reduce Clusters Using Mantri. In 9th USENIX Conference on Operating Systems Design and Implementation(OSDI'10). 265-278.
[5]
Ali Anwar, Yue Cheng, Aayush Gupta, and Ali R. Butt. 2016. MOS: Workload-Aware Elasticity for Cloud Object Stores. In 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16). 177-188.
[6]
Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard París, Pierre Sutra, and Pedro García-López. 2019. On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures. In 20th International Middleware Conference (Middleware'19). 41-54.
[7]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In 6th Conference on Symposium on Operating Systems Design and Implementation (OSDI'04)- Volume 6. 10.
[8]
Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 475-488.
[9]
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI'17). 363-376.
[10]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. In 19th ACM Symposium on Operating Systems Principles (SOSP'03). 29-43.
[11]
Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In 2017 Symposium on Cloud Computing (SoCC'17). 445-451.
[12]
Y. Kim and J. Lin. 2018. Serverless Data Analytics with Flint. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD'18). 451-455.
[13]
Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). 427-444.
[14]
Ingo Müller, Renato Marroquín, and Gustavo Alonso. 2020. Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure. In 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 115-130.
[15]
Tim Peters. 2015. Timsort description. http://svn.python.org/projects/python/trunk/Objects/listsort.txt.
[16]
Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI'19). 193-206.
[17]
Josep Sampé, Pedro García-López, Marc Sánchez-Artigas, Gil Vernik, Pol Roca-Llaberia, and Aitor Arjona. 2020. Towards Multicloud Access Transparency in Serverless Computing. IEEE Software (2020).
[18]
Josep Sampé, Gil Vernik, Marc Sánchez-Artigas, and Pedro García-López. 2018. Serverless Data Analytics in the IBM Cloud. In 19th International Middleware Conference Industry (Middleware'18). 1-8.
[19]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST'10). 1-10.
[20]
Bharath Bhushan Venkat Sowrirajan and Mayank Ahuja. 2020. Qubole Announces Apache Spark on AWS Lambda. https://www.qubole.com/blog/spark-on-aws-lambda/.
[21]
A. Verma, N. Zea, B. Cho, I. Gupta, and R. H. Campbell. 2010. Breaking the MapReduce Stage Barrier. In 2010 IEEE International Conference on Cluster Computing (Cluster'10). 235-244.
[22]
Tom White. 2015. Hadoop: The Definitive Guide, Fourth Edition (4th ed.). O'Reilly Media, Inc.
[23]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI'12). 15-28.
[24]
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. 2008. Improving MapReduce Performance in Heterogeneous Environments. In 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). 29-42.

Cited By

View all
  • (2024)Serverful Functions: Leveraging Servers in Complex Serverless Workflows (industry track)Proceedings of the 25th International Middleware Conference Industrial Track10.1145/3700824.3701095(15-21)Online publication date: 2-Dec-2024
  • (2024)Dexter: A Performance-Cost Efficient Resource Allocation Manager for Serverless Data AnalyticsProceedings of the 25th International Middleware Conference10.1145/3652892.3700753(117-130)Online publication date: 2-Dec-2024
  • (2024)A reference architecture for serverless big data processingFuture Generation Computer Systems10.1016/j.future.2024.01.029155:C(179-192)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '20: Proceedings of the 21st International Middleware Conference Industrial Track
December 2020
49 pages
ISBN:9781450382014
DOI:10.1145/3429357
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. function-as-a-service
  2. serverless computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Middleware '20
Sponsor:
Middleware '20: 21st International Middleware Conference
December 7 - 11, 2020
Delft, Netherlands

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)5
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Serverful Functions: Leveraging Servers in Complex Serverless Workflows (industry track)Proceedings of the 25th International Middleware Conference Industrial Track10.1145/3700824.3701095(15-21)Online publication date: 2-Dec-2024
  • (2024)Dexter: A Performance-Cost Efficient Resource Allocation Manager for Serverless Data AnalyticsProceedings of the 25th International Middleware Conference10.1145/3652892.3700753(117-130)Online publication date: 2-Dec-2024
  • (2024)A reference architecture for serverless big data processingFuture Generation Computer Systems10.1016/j.future.2024.01.029155:C(179-192)Online publication date: 1-Jun-2024
  • (2023)Cackle: Analytical Workload Cost and Performance Stability With Elastic PoolsProceedings of the ACM on Management of Data10.1145/36267201:4(1-25)Online publication date: 12-Dec-2023
  • (2023)GliderProceedings of the 24th International Middleware Conference10.1145/3590140.3629119(247-260)Online publication date: 27-Nov-2023
  • (2023)Rise of the Planet of Serverless Computing: A Systematic ReviewACM Transactions on Software Engineering and Methodology10.1145/357964332:5(1-61)Online publication date: 21-Jul-2023
  • (2023)Outsourcing Data Processing Jobs With LithopsIEEE Transactions on Cloud Computing10.1109/TCC.2021.312900011:1(1026-1037)Online publication date: 1-Jan-2023
  • (2023)On Data Processing through the Lenses of S3 Object LambdaIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228890(1-10)Online publication date: 17-May-2023
  • (2023)Is Performance of Object Storage Predictable for Serverless I/O Workloads? A Comparative Study2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355617(1-6)Online publication date: 10-Oct-2023
  • (2023)A Seer Knows Best: Auto-tuned Object Storage Shuffling for Serverless AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104763(104763)Online publication date: Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media