[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3626562.3626828acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Practical Storage-Compute Elasticity for Stream Data Processing

Published: 11 December 2023 Publication History

Abstract

Stream processing pipelines need to handle workload fluctuations (e.g., daily patterns, popularity spikes) by scaling up/down the resources contributed to running jobs. While there have been efforts proposing auto-scaling mechanisms for stream processing engines, prior work has overlooked the role of the storage system in ingesting and serving stream data. The absence of effective scaling for data streams is problematic given that the number of parallel partitions of a data stream limits both streaming data ingestion throughput and read parallelism for downstream streaming jobs. In this paper, we propose to augment the auto-scaling notion of stream processing engines with information about the source data stream. The key novelty of our approach lies in exploiting elastic data streams to ingest data, which is a unique feature of Pravega: a storage system for data streams part of the Dell's Streaming Data Platform. Pravega streams can dynamically change their parallelism based on the ingestion workload, and such information can in turn be exploited for auto-scaling the streaming job downstream. To this end, we have developed an Apache Flink connector for Pravega, as well as an auto-scaling orchestrator that feeds on data stream metrics. Our experiments show how a stream processing pipeline auto-scales by coordinating data stream and processing parallelism under workload fluctuations, with low operations cost.

References

[1]
2023. Apache Bookkeeper. https://bookkeeper.apache.org.
[2]
2023. Apache Druid. https://druid.apache.org.
[3]
2023. Apache Flink. https://flink.apache.org.
[4]
2023. Apache Flink JIRA - FLINK-6390. https://issues.apache.org/jira/browse/FLINK-6390.
[5]
2023. Apache Flink JIRA - FLINK-7210. https://issues.apache.org/jira/browse/FLINK-7210.
[6]
2023. Apache Kafka. https://kafka.apache.org.
[7]
2023. Apache Kafka - Documentation. https://kafka.apache.org/documentation.
[8]
2023. Apache Pulsar. https://pulsar.apache.org.
[9]
2023. Apache Spark. https://spark.apache.org.
[10]
2023. Apache Zookeeper. https://zookeeper.apache.org.
[11]
2023. Dell Streaming Data Platform. https://www.dell.com/en-us/dt/storage/streaming- data- platform.htm.
[12]
2023. Kubernetes - Horizontal Pod Autoscaling. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale.
[13]
2023. Pravega. https://cncf.pravega.io.
[14]
2023. Pravega - Elastic Streams: Auto-scaling. https://cncf.pravega.io/docs/latest/pravega-concepts/#elastic-streams-auto-scaling.
[15]
2023. Pravega - Flink Connector. https://github.com/pravega/flink-connector.
[16]
2023. Pravega - Working with Pravega: Transactions. https://cncf.pravega.io/docs/latest/transactions.
[17]
2023. Prometheus. https://prometheus.io/.
[18]
2023. RedPanda. https://redpanda.com.
[19]
2023. UK Health State Agency Data. https://www.ncbi.nlm.nih.gov/bioproject/248064.
[20]
2023. Wikipedia - Two-phase commit protocol. https://en.wikipedia.org/wiki/Two-phase_commit_protocol.
[21]
Apache Flink. 2023. Apache Flink Javadoc - TwoPhaseCommitSinkFunction. https://nightlies.apache.org/flink/flink-docs-release-1.17/api/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.html.
[22]
HamidReza Arkian, Guillaume Pierre, Johan Tordsson, and Erik Elmroth. 2021. Model-based stream processing auto-scaling in geo-distributed environments. In IEEE ICCCN'21. 1--10.
[23]
Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A framework for integrating batch and online mapreduce computations. VLDB Endowment 7, 13 (2014), 1441--1451.
[24]
Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, and Kostas Tzoumas. 2015. Lightweight asynchronous snapshots for distributed dataflows. arXiv preprint arXiv:1506.08603 (2015).
[25]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4 (2015).
[26]
Valeria Cardellini, Francesco Lo Presti, Matteo Nardelli, and Gabriele Russo Russo. 2022. Runtime adaptation of data stream processing systems: The state of the art. Comput. Surveys 54, 11s (2022), 1--36.
[27]
Valeria Cardellini, Francesco Lo Presti, Matteo Nardelli, and Gabriele Russo Russo. 2018. Decentralized self-adaptation for elastic data stream processing. Future Generation Computer Systems 87 (2018), 171--185.
[28]
K Mani Chandy and Leslie Lamport. 1985. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3, 1 (1985), 63--75.
[29]
Marcos Dias de Assuncao, Alexandre da Silva Veith, and Rajkumar Buyya. 2018. Distributed data stream processing and edge computing: A survey on resource elasticity and future directions. Journal of Network and Computer Applications 103 (2018), 1--17.
[30]
Dell Technologies. 2023. Dell EMC Isilon H600 Hybrid NAS Storage. https://www.dell.com/en-uk/dt/storage/isilon/isilon-h600-hybrid-nas-storage.ht.
[31]
Dell Technologies. 2023. Dell Streaming Data Platform: Architecture, Configuration, and Considerations. https://www.delltechnologies.com/asset/en-sg/products/storage/industry-market/h18162-streaming-data-platform-architecture.pdf.
[32]
Buğra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. 2013. Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2013), 1447--1463.
[33]
Kordian Gontarska, Morgan Geldenhuys, Dominik Scheinert, Philipp Wiesner, Andreas Polze, and Lauritz Thamsen. 2021. Evaluation of load prediction techniques for distributed stream processing. In IEEE IC2E'21. 91--98.
[34]
Raúl Gracia-Tinedo, Yongchao Tian, Josep Sampé, Hamza Harkous, John Lenton, Pedro García-López, Marc Sánchez-Artigas, and Marko Vukolic. 2015. Dissecting ubuntuone: Autopsy of a global-scale personal cloud back-end. In ACM IMC'15. 155--168.
[35]
Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. 2010. ZooKeeper: wait-free coordination for internet-scale systems. In USENIX ATC'10, Vol. 8.
[36]
Haruna Isah, Tariq Abughofa, Sazia Mahfuz, Dharmitha Ajerla, Farhana Zulkernine, and Shahzad Khan. 2019. A survey of distributed data stream processing frameworks. IEEE Access 7 (2019), 154300--154316.
[37]
Flavio Junqueira and Raúl Gracia. 2021. Pravega Blog - When Speed meets Parallelism: Pravega performance under parallel streaming workloads. https://cncf.pravega.io/blog/2021/03/10/when-speed-meets-parallelism-pravega-performance-under-parallel-streaming-workloads/.
[38]
Flavio P Junqueira, Ivan Kelly, and Benjamin Reed. 2013. Durability with bookkeeper. ACM SIGOPS operating systems review 47, 1 (2013), 9--15.
[39]
Vasiliki Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In USENIX OSDI'18. 783--798.
[40]
Roland Kotto Kombi, Nicolas Lumineau, and Philippe Lamarre. 2017. A preventive auto-parallelization approach for elastic stream processing. In IEEE ICDCS'17. 1532--1542.
[41]
Andrew W Leung, Shankar Pasupathy, Garth R Goodson, and Ethan L Miller. 2008. Measurement and Analysis of Large-Scale Network File System Workloads. In USENIX ATC'08, Vol. 1. 5--2.
[42]
Björn Lohrmann, Peter Janacik, and Odej Kao. 2015. Elastic stream processing with latency guarantees. In IEEE ICDCS'15. 399--410.
[43]
Yuan Mei, Luwei Cheng, Vanish Talwar, Michael Y Levin, Gabriela Jacques-Silva, Nikhil Simha, Anirban Banerjee, Brian Smith, Tim Williamson, Serhat Yilmaz, et al. 2020. Turbine: Facebook's service management platform for stream processing. In IEEE ICDE'20. 1591--1602.
[44]
Robert Metzger. 2021. Flink Blog - Scaling Flink automatically with Reactive Mode. https://flink.apache.org/2021/05/06/scaling-flink-automatically-with-reactive- mode/.
[45]
Shadi A Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H Campbell. 2017. Samza: stateful scalable stream processing at LinkedIn. VLDB Endowment 10, 12 (2017), 1634--1645.
[46]
Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty. 2020. Auto-sizing for stream processing applications at linkedin. In USENIX HotCloud'20. 22--22.
[47]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. USENIX HotCloud'10 10, 10-10 (2010), 95.
[48]
Liang Zhang, Wenli Zheng, Chao Li, Yao Shen, and Minyi Guo. 2021. Autrascale: an automated and transfer learning solution for streaming system auto-scaling. In IEEE IPDPS'21. 912--921.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '23: Proceedings of the 24th International Middleware Conference: Industrial Track
December 2023
52 pages
ISBN:9798400704277
DOI:10.1145/3626562
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. auto-scaling
  2. data processing
  3. data streams

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 156
    Total Downloads
  • Downloads (Last 12 months)115
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media