[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2611286.2611309acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Cloud-based data stream processing

Published: 26 May 2014 Publication History

Abstract

In this tutorial we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) scalability and (2) fault tolerance in large scale distributed streaming systems. In general, new fault tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In CIDR, pages 277--289, 2005.
[2]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. In VLDB, pages 120--139, 2003.
[3]
T. Akidau, A. Balikov, K. Bekiroğlu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: fault-tolerant stream processing at internet scale. In VLDB, pages 1033--1044, 2013.
[4]
L. Aniello, R. Baldoni, and L. Querzoni. Adaptive online scheduling in storm. In DEBS, pages 207--218, 2013.
[5]
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, pages 50--58, 2010.
[6]
M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. ACM TODS, 2008.
[7]
R. S. Barga and H. Caituiro-Monge. Event correlation and pattern detection in CEDR. In EDBT, pages 919--930, 2006.
[8]
P. Bellavista, A. Corradi, S. Kotoulas, and A. Reale. Adaptive fault-tolerance for dynamic resource provisioning in distributed stream processing systems. In EDBT, pages 85--96, 2014.
[9]
A. Brito, A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, and C. Fetzer. Scalable and low-latency data processing with StreamMapReduce. In CloudCom, pages 48--58, 2011.
[10]
B. Chandramouli, J. Goldstein, R. Barga, M. Riedewald, and I. Santos. Accurate latency estimation in a distributed event processing system. In ICDE, pages 255--266, 2011.
[11]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: continuous dataflow processing. In SIGMOD, pages 668--668, 2003.
[12]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM TOCS, 2008.
[13]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. In NSDI, 2010.
[14]
M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. In Middleware, pages 269--288, 2011.
[15]
R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD, pages 725--736, 2013.
[16]
D. Florescu and D. Kossmann. Rethinking cost and performance of database systems. ACM Sigmod Record, pages 43--48, 2009.
[17]
B. Gedik, S. Schneider, M. Hirzel, and K. Wu. Elastic scaling for data stream processing. IEEE TPDS, 2013.
[18]
V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE TPDS, pages 2351--2365, 2012.
[19]
M. Hirzel. Partition and compose: Parallel complex event processing. In DEBS, pages 191--200, 2012.
[20]
J.-H. Hwang, Y. Xing, U. Cetintemel, and S. Zdonik. A cooperative, self-configuring high-availability solution for stream processing. In ICDE, pages 176--185, 2007.
[21]
IBM. Dublin city council - traffic flow improved by big data analytics used to predict bus arrival and transit times. http://www-03.ibm.com/software/businesscasestudies/en/us/corp?docid=RNAE-9C9PN5, 2013.
[22]
N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road bnchmark on the stream processing core. In SIGMOD, pages 431--442, 2006.
[23]
E. Kalyvianaki, W. Wiesemann, Q. H. Vu, D. Kuhn, and P. Pietzuch. SQPR: Stream query planning with reuse. In ICDE, pages 840--851, 2011.
[24]
G. T. Lakshmanan, Y. Li, and R. Strom. Placement strategies for internet-scale data stream systems. IEEE Internet Computing, pages 50--60, 2008.
[25]
H. Lim and S. Babu. Execution and optimization of continuous queries with cyclops. In SIGMOD, pages 1069--1072, 2013.
[26]
A. Martin, C. Fetzer, and A. Brito. Active replication at (almost) no cost. In SRDS, pages 21--30, 2011.
[27]
N. Marz. Storm: Distributed and fault-tolerant realtime computation, 2012.
[28]
N. Marz. Trident tutorial. https://github.com/nathanmarz/storm/wiki/Trident-tutorial, 2013.
[29]
J. F. Naughton, D. J. DeWitt, D. Maier, A. Aboulnaga, J. Chen, L. Galanis, J. Kang, R. Krishnamurthy, Q. Luo, N. Prakash, et al. The Niagara internet query system. IEEE Data Eng. Bull., pages 27--33, 2001.
[30]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, pages 170--177, 2010.
[31]
P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, and M. Seltzer. Network-aware operator placement for stream-processing systems. In ICDE, pages 49--49, 2006.
[32]
D. Powell. Delta4: A generic architecture for dependable distributed computing. ESPRIT Research Reports, 1, 1991.
[33]
Z. Qian, Y. He, C. Su, Z. Wu, H. Zhu, T. Zhang, L. Zhou, Y. Yu, and Z. Zhang. Timestream: Reliable stream computation in the cloud. In Eurosys, 2013.
[34]
E. A. Rundensteiner, L. Ding, T. Sutherland, Y. Zhu, B. Pielech, and N. Mehta. CAPE: Continuous query engine with heterogeneous-grained adaptivity. In VLDB, pages 1353--1356, 2004.
[35]
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu. Elastic scaling of data parallel operators in stream processing. In IPDPS, pages 1--12, 2009.
[36]
S. Schneider, M. Hirzel, B. Gedik, and K.-L. Wu. Auto-parallelizing stateful distributed streaming applications. In PACT, pages 53--64, 2012.
[37]
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In ICDE, pages 25--36, 2003.
[38]
Twitter. Improving twitter search with real-time human computation. https://blog.twitter.com/2013/improving-twitter-search-with-real-time-human-computation, 2013.
[39]
J. Wolf, N. Bansal, K. Hildrum, S. Parekh, D. Rajan, R. Wagle, K.-L. Wu, and L. Fleischer. SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In Middleware, pages 306--325, 2008.
[40]
Y. Xing, J.-H. Hwang, U. Çetintemel, and S. Zdonik. Providing resiliency to load variations in distributed stream processing. In VLDB, pages 775--786, 2006.
[41]
Y. Yang, J. Kramer, D. Papadias, and B. Seeger. Hybmig: A hybrid approach to dynamic plan migration for continuous queries. IEEE TKDE, pages 398--411, 2007.
[42]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, pages 423--438, 2013.
[43]
Y. Zhu, E. A. Rundensteiner, and G. T. Heineman. Dynamic plan migration for continuous queries over data streams. In SIGMOD, pages 431--442, 2004.

Cited By

View all
  • (2024)Optimization enabled elastic scaling in cloud based on predicted load for resource managementMultiagent and Grid Systems10.3233/MGS-23000319:4(289-311)Online publication date: 4-Mar-2024
  • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: Sep-2025
  • (2023)Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous ResourcesACM Transactions on Autonomous and Adaptive Systems10.1145/359743518:4(1-44)Online publication date: 14-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '14: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems
May 2014
371 pages
ISBN:9781450327374
DOI:10.1145/2611286
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 May 2014

Check for updates

Author Tags

  1. cloud-based data stream processing
  2. fault tolerance
  3. load balancing

Qualifiers

  • Research-article

Funding Sources

Conference

DEBS '14

Acceptance Rates

DEBS '14 Paper Acceptance Rate 16 of 174 submissions, 9%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimization enabled elastic scaling in cloud based on predicted load for resource managementMultiagent and Grid Systems10.3233/MGS-23000319:4(289-311)Online publication date: 4-Mar-2024
  • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: Sep-2025
  • (2023)Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous ResourcesACM Transactions on Autonomous and Adaptive Systems10.1145/359743518:4(1-44)Online publication date: 14-Oct-2023
  • (2023)Using Reinforcement Learning to Control Auto-Scaling of Distributed ApplicationsCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3585427(137-138)Online publication date: 15-Apr-2023
  • (2023)Internet of Behaviors: A SurveyIEEE Internet of Things Journal10.1109/JIOT.2023.3247594(1-1)Online publication date: 2023
  • (2023)A survey on the evolution of stream processing systemsThe VLDB Journal10.1007/s00778-023-00819-833:2(507-541)Online publication date: 22-Nov-2023
  • (2022)Edge-Based Runtime Verification for the Internet of ThingsIEEE Transactions on Services Computing10.1109/TSC.2021.307495615:5(2713-2727)Online publication date: 1-Sep-2022
  • (2022)Shepherd: Seamless Stream Processing on the Edge2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00011(40-53)Online publication date: Dec-2022
  • (2022)Automatic Performance Tuning for Distributed Data Stream Processing Systems2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00296(3194-3197)Online publication date: May-2022
  • (2022)A comprehensive study on fault tolerance in stream processing systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0248-x16:2Online publication date: 1-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media