[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Optimal Operator Replication and Placement for Distributed Stream Processing Systems

Published: 10 May 2017 Publication History

Abstract

Exploiting on-the-fly computation, Data Stream Processing (DSP) applications are widely used to process unbounded streams of data and extract valuable information in a near real-time fashion. As such, they enable the development of new intelligent and pervasive services that can improve our everyday life. To keep up with the high volume of daily produced data, the operators that compose a DSP application can be replicated and placed on multiple, possibly distributed, computing nodes, so to process the incoming data flow in parallel. Moreover, to better exploit the abundance of diffused computational resources (e.g., Fog computing), recent trends investigate the possibility of decentralizing the DSP application placement.
In this paper, we present and evaluate a general formulation of the optimal DSP replication and placement (ODRP) as an integer linear programming problem, which takes into account the heterogeneity of application requirements and infrastructural resources. We integrate ODRP as prototype scheduler in the Apache Storm DSP framework. By leveraging on the DEBS 2015 Grand Challenge as benchmark application, we show the benefits of a joint optimization of operator replication and placement and how ODRP can optimize different QoS metrics, namely response time, internode traffic, cost, availability, and a combination thereof.

References

[1]
L. Aniello, R. Baldoni, and L. Querzoni. Adaptive online scheduling in Storm. In Proc. of ACM DEBS '13, pages 207--218, 2013.
[2]
V. Cardellini, V. Grassi, F. Lo Presti, and M. Nardelli. Distributed QoS-aware scheduling in Storm. In Proc. of ACM DEBS '15, pages 344--347, 2015.
[3]
V. Cardellini, V. Grassi, F. Lo Presti, and M. Nardelli. Joint operator replication and placement optimization for distributed streaming applications. In Proc. of InfQ '16 (in conjunction with VALUETOOLS '16), 2016.
[4]
V. Cardellini, V. Grassi, F. Lo Presti, and M. Nardelli. Optimal operator placement for distributed stream processing applications. In Proc. of ACM DEBS '16, pages 69--80, 2016.
[5]
V. Cardellini, M. Nardelli, and D. Luzi. Elastic stateful stream processing in Storm. In Proc. of HPCS '16, pages 583--590. IEEE, 2016.
[6]
A. Chatzistergiou and S. D. Viglas. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proc. of ACM CIKM '14, 2014.
[7]
F. Dabek, R. Cox, F. Kaashoek, and R. Morris. Vivaldi: A decentralized network coordinate system. SIGCOMM Comput. Commun. Rev., 34(4), 2004.
[8]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. of OSDI '04, pages 137--150. USENIX Association, 2004.
[9]
R. Eidenbenz and T. Locher. Task allocation for distributed stream processing. In Proc. of IEEE INFOCOM '16, 2016.
[10]
L. Fischer, T. Scharrenbach, and A. Bernstein. Scalable linked data stream processing via network-aware workload scheduling. In Proc. of SSWS '13, pages 81--96, 2013.
[11]
T. Z. J. Fu, J. Ding, R. T. B. Ma, M. Winslett, et al. DRS: Dynamic resource scheduling for real-time analytics over fast streams. In Proc. of IEEE ICDCS 2015, pages 411--420, 2015.
[12]
B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu. Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst., 25(6):1447--1463, 2014.
[13]
T. Heinze, L. Aniello, L. Querzoni, and Z. Jerzak. Cloud-based data stream processing. In Proc. of ACM DEBS '14, pages 238--245, 2014.
[14]
T. Heinze, Z. Jerzak, G. Hackenbroich, and C. Fetzer. Latency-aware elastic scaling for distributed data stream processing systems. In Proc. of ACM DEBS '14, pages 13--22, 2014.
[15]
T. Heinze, V. Pappalardo, Z. Jerzak, and C. Fetzer. Auto-scaling techniques for elastic data stream processing. In Proc. of IEEE ICDEW '14, pages 296--302, 2014.
[16]
T. Heinze, L. Roediger, A. Meister, Y. Ji, et al. Online parameter optimization for elastic data stream processing. In Proc. of ACM SoCC '15, pages 276--287, 2015.
[17]
Z. Jerzak and H. Ziekow. The DEBS 2015 grand challenge. In Proc. of ACM DEBS '15, pages 266--268. ACM, 2015.
[18]
S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, et al. Twitter Heron: Stream processing at scale. In Proc. of ACM SIGMOD '15, pages 239--250, 2015.
[19]
T. Li, J. Tang, and J. Xu. A predictive scheduling framework for fast and distributed stream data processing. In Proc. of IEEE Big Data '15, 2015.
[20]
B. Lohrmann, P. Janacik, and O. Kao. Elastic stream processing with latency guarantees. In Proc. of IEEE ICDCS '15, pages 399--410, 2015.
[21]
G. Mencagli. A game-theoretic approach for elastic distributed data stream processing. ACM Trans. Auton. Adapt. Syst., 11(2):13:1--13:34, 2016.
[22]
P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, et al. Network-aware operator placement for stream-processing systems. In Proc. of IEEE ICDE '06, 2006.
[23]
W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu. Edge computing: Vision and challenges. IEEE Internet of Things J., 3(5):637--646, Oct. 2016.
[24]
C. Thoma, A. Labrinidis, and A. Lee. Automated operator placement in distributed data stream management systems subject to user constraints. In Proc. of IEEE ICDEW '14, pages 310--316, 2014.
[25]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, et al. Storm@Twitter. In Proc. of ACM SIGMOD '14, pages 147--156, 2014.
[26]
J. Xu, Z. Chen, J. Tang, and S. Su. T-Storm: traffic-aware online scheduling in Storm. In Proc. of IEEE ICDCS '14, pages 535--544, 2014.
[27]
K. P. Yoon and C.-L. Hwang. Multiple Attribute Decision Making: an Introduction. Sage Pubs, 1995.
[28]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of NSDI '12. USENIX Association, 2012.

Cited By

View all
  • (2024)Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS TargetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339983435:7(1251-1267)Online publication date: Jul-2024
  • (2024)ZeroTune: Learned Zero-Shot Cost Models for Parallelism Tuning in Stream Processing.2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00163(2040-2053)Online publication date: 13-May-2024
  • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: 1-Jan-2024
  • Show More Cited By
  1. Optimal Operator Replication and Placement for Distributed Stream Processing Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 44, Issue 4
    March 2017
    101 pages
    ISSN:0163-5999
    DOI:10.1145/3092819
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 May 2017
    Published in SIGMETRICS Volume 44, Issue 4

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS TargetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339983435:7(1251-1267)Online publication date: Jul-2024
    • (2024)ZeroTune: Learned Zero-Shot Cost Models for Parallelism Tuning in Stream Processing.2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00163(2040-2053)Online publication date: 13-May-2024
    • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: 1-Jan-2024
    • (2024)GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systemsCluster Computing10.1007/s10586-023-04260-y27:5(5815-5832)Online publication date: 1-Aug-2024
    • (2024)Optimizing Service Replication and Placement for IoT Applications in Fog Computing SystemsEuro-Par 2024: Parallel Processing10.1007/978-3-031-69577-3_20(283-297)Online publication date: 26-Aug-2024
    • (2024)Lc‐Stream: An elastic scheduling strategy with latency constraints in geo‐distributed stream computing environmentsConcurrency and Computation: Practice and Experience10.1002/cpe.808536:14Online publication date: 20-Mar-2024
    • (2023)SDN-enabled Resource Provisioning Framework for Geo-Distributed Streaming AnalyticsACM Transactions on Internet Technology10.1145/357115823:1(1-21)Online publication date: 23-Feb-2023
    • (2023)Storm-RTS: Stream Processing with Stable Performance for Multi-Cloud and Cloud-edge2023 IEEE 16th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD60044.2023.00015(45-57)Online publication date: Jul-2023
    • (2022)POTUS: Predictive Online Tuple Scheduling for Data Stream Processing SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2020.303257710:4(2863-2875)Online publication date: 1-Oct-2022
    • (2022)Two-stage Scheduling of Stream Computing for Industrial Cloud-edge Collaboration2022 IEEE International Conference on Joint Cloud Computing (JCC)10.1109/JCC56315.2022.00016(57-64)Online publication date: Aug-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media