[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies

Published: 03 May 2024 Publication History

Abstract

A recent trend in stream processing is offloading the computation of decomposable aggregation functions (DAF) from cloud nodes to geo-distributed fog/edge devices to decrease latency and improve energy efficiency. However, deploying DAFs on low-end devices is challenging due to their volatility and limited resources. Additionally, in geo-distributed fog/edge environments, creating new operator instances on demand and replicating operators ubiquitously is restricted, posing challenges for achieving load balancing without overloading devices. Existing work predominantly focuses on cloud environments, overlooking DAF operator placement in resource-constrained and unreliable geo-distributed settings.
This paper presents NEMO, a resource-aware optimization approach that determines the replication factor and placement of DAF operators in resource-constrained geo-distributed topologies. Leveraging Euclidean embeddings of network topologies and a set of heuristics, NEMO scales to millions of nodes and handles topo-logical changes through adaptive re-placement and re-replication decisions. Compared to existing solutions, NEMO achieves up to 6× lower latency and up to 15× reduction in communication cost, while preventing overloaded nodes. Moreover, NEMO re-optimizes placements in constant time, regardless of the topology size. As a result, it lays the foundation to efficiently process continuous data streams on large, heterogeneous, and geo-distributed topologies.

References

[1]
Cédric Adjih, Emmanuel Baccelli, Eric Fleury, Gaetan Harter, Nathalie Mitton, Thomas Noël, Roger Pissard-Gibollet, Frederic Saint-Marcel, Guillaume Schreiner, Julien Vandaele, and Thomas Watteyne. 2015. FIT IoT-LAB: A large scale open experimental IoT testbed. In 2nd IEEE World Forum on Internet of Things. 459--464.
[2]
Shadi Al-Sarawi, Mohammed Anbar, Kamal Alieyan, and Mahmood Alzubaidi. 2017. Internet of Things (IoT) communication protocols. In 8th International Conference on Information Technology, ICIT. 685--690.
[3]
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive online scheduling in storm. In The 7th ACM International Conference on Distributed Event-Based Systems, DEBS. 207--218.
[4]
Tanapat Anusas-Amornkul and Sirasit Sangrat. 2017. Linux Server Monitoring and Self-healing System Using Nagios. In Mobile Web and Intelligent Information Systems - 14th International Conference, MobiWIS, Vol. 10486. 290--302.
[5]
Lawrence Benson, Philipp M. Grulich, Steffen Zeuch, Volker Markl, and Tilmann Rabl. 2020. Disco: Efficient Distributed Window Aggregation. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT. 423--426.
[6]
Robert L. Cannon, Jitendra V. Dave, and James C. Bezdek. 1986. Efficient Implementation of the Fuzzy c-Means Clustering Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8, 2 (1986), 248--255.
[7]
Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2016. Optimal operator placement for distributed stream processing applications. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, DEBS. 69--80.
[8]
Valeria Cardellini, Francesco Lo Presti, Matteo Nardelli, and Gabriele Russo Russo. 2018. Optimal operator deployment and replication for elastic distributed data stream processing. Concurr. Comput. Pract. Exp. 30, 9 (2018).
[9]
Ramaswamy Chandrasekaran and Arie Tamir. 1990. Algebraic Optimization: The Fermat-Weber Location Problem. Math. Program. 46 (1990), 219--224.
[10]
Supriyo Chatterjea and Paul JM Havinga. 2003. A dynamic data aggregation scheme for wireless sensor networks. In 14th Workshop on Circuits, Systems and Signal Processing, ProRISC.
[11]
Xenofon Chatziliadis, Eleni Tzirita Zacharatou, Steffen Zeuch, and Volker Markl. 2021. Monitoring of Stream Processing Engines Beyond the Cloud: An Overview. Open J. Internet Things 7, 1 (2021), 71--82.
[12]
Ankit Chaudhary, Steffen Zeuch, and Volker Markl. 2020. Governor: Operator Placement for a Unified Fog-Cloud Environment. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT. 631--634.
[13]
Ankit Chaudhary, Steffen Zeuch, Volker Markl, and Jeyhun Karimov. 2023. Incremental Stream Query Merging. In Proceedings 26th International Conference on Extending Database Technology, EDBT. 604--617.
[14]
Yang Chen, Xiao Wang, Cong Shi, Eng Keong Lua, Xiaoming Fu, Beixing Deng, and Xing Li. 2011. Phoenix: A Weight-Based Network Coordinate System Using Matrix Factorization. IEEE Trans. Netw. Serv. Manag. 8, 4 (2011), 334--347.
[15]
Ruosi Cheng and Yu Wang. 2018. A Survey on Network Coordinate Systems. In MATEC Web of Conferences.
[16]
Brent N. Chun, David E. Culler, Timothy Roscoe, Andy C. Bavier, Larry L. Peterson, Mike Wawrzoniak, and Mic Bowman. 2003. PlanetLab: An overlay testbed for broad-coverage services. Comput. Commun. Rev. 33, 3 (2003), 3--12.
[17]
Manuel Costa, Miguel Castro, Antony I. T. Rowstron, and Peter B. Key. 2004. PIC: Practical Internet Coordinates for Distance Estimation. In 24th International Conference on Distributed Computing Systems, ICDCS. 178--187.
[18]
James A. Cowling, Dan R. K. Ports, Barbara Liskov, Raluca Ada Popa, and Abhijeet Gaikwad. 2009. Census: Location-Aware Membership Management for Large-Scale Distributed Systems. In USENIX Annual Technical Conference.
[19]
Frank Dabek, Russ Cox, M. Frans Kaashoek, and Robert Tappan Morris. 2004. Vivaldi: A decentralized network coordinate system. In Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM. 15--26.
[20]
Marcos Dias de Assunção, Alexandre Da Silva Veith, and Rajkumar Buyya. 2018. Distributed data stream processing and edge computing: A survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103 (2018), 1--17.
[21]
Min Ding, Xiuzhen Cheng, and Guoliang Xue. 2003. Aggregation tree construction in sensor networks. In 58th Vehicular Technology Conference, VTC. 2168--2172.
[22]
Raphael Eidenbenz and Thomas Locher. 2016. Task allocation for distributed stream processing. In 35th Annual IEEE International Conference on Computer Communications, INFOCOM. 1--9.
[23]
Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, and Karthik Ramasamy. 2017. Dhalion: Self-Regulating Stream Processing in Heron. Proc. VLDB Endow. 10, 12 (2017), 1825--1836.
[24]
Stefano Forti, Marco Gaglianese, and Antonio Brogi. 2021. Lightweight self-organising distributed monitoring of Fog infrastructures. Future Gener. Comput. Syst. 114 (2021), 605--618.
[25]
Thomas M. J. Fruchterman and Edward M. Reingold. 1991. Graph Drawing by Force-directed Placement. Softw. Pract. Exp. 21, 11 (1991), 1129--1164.
[26]
Bugra Gedik. 2014. Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23, 4 (2014), 517--539.
[27]
P. Krishna Gummadi, Stefan Saroiu, and Steven D. Gribble. 2002. King: Estimating latency between arbitrary internet end hosts. Comput. Commun. Rev. 32, 3 (2002), 11.
[28]
Thomas Heinze, Yuanzhen Ji, Lars Roediger, Valerio Pappalardo, Andreas Meister, Zbigniew Jerzak, and Christof Fetzer. 2015. FUGU: Elastic Data Stream Processing with Latency Constraints. IEEE Data Eng. Bull. 38, 4 (2015), 73--81.
[29]
Wendi Rabiner Heinzelman, Anantha P. Chandrakasan, and Hari Balakrishnan. 2000. Energy-Efficient Communication Protocol for Wireless Microsensor Networks. In 33rd Annual Hawaii International Conference on System Sciences, HICSS. 10--20.
[30]
Jeong-Hyon Hwang, Ugur Çetintemel, and Stanley B. Zdonik. 2008. Fast and Highly-Available Stream Processing over Wide Area Networks. In Proceedings of the 24th International Conference on Data Engineering, ICDE. 804--813.
[31]
Richard M. Karp. 2010. Reducibility Among Combinatorial Problems. In 50 Years of Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art. Springer, 219--241.
[32]
Geetika T. Lakshmanan, Ying Li, and Robert E. Strom. 2008. Placement Strategies for Internet-Scale Data Stream Systems. IEEE Internet Comput. 12, 6 (2008), 50--60.
[33]
Aljoscha P. Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Philipp M. Grulich, Ankit Chaudhary, Steffen Zeuch, and Volker Markl. 2023. Showcasing Data Management Challenges for Future IoT Applications with NebulaStream. Proc. VLDB Endow. 16, 12 (2023), 3930--3933.
[34]
Yongjun Liao, Wei Du, Pierre Geurts, and Guy Leduc. 2013. DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction. IEEE/ACM Trans. Netw. 21, 5 (2013), 1511--1524.
[35]
Stephanie Lindsey, Cauligi S. Raghavendra, and Krishna M. Sivalingam. 2002. Data Gathering Algorithms in Sensor Networks Using Energy Metrics. IEEE Trans. Parallel Distributed Syst. 13, 9 (2002), 924--935.
[36]
Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 2 (1982), 129--136.
[37]
Björn Lohrmann, Peter Janacik, and Odej Kao. 2015. Elastic Stream Processing with Latency Guarantees. In 35th IEEE International Conference on Distributed Computing Systems, ICDCS. 399--410.
[38]
Cristian Lumezanu, Randolph Baden, Dave Levin, Neil Spring, and Bobby Bhattacharjee. 2009. Symbiotic Relationships in Internet Routing Overlays. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI. 467--480.
[39]
Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. 2005. TinyDB: An acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30, 1 (2005), 122--173.
[40]
Kasper Grud Skat Madsen, Yongluan Zhou, and Jianneng Cao. 2017. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. In 33rd IEEE International Conference on Data Engineering, ICDE. 227--230.
[41]
Quazi Mamun. 2012. A Qualitative Comparison of Different Logical Topologies for Wireless Sensor Networks. Sensors 12, 11 (2012), 14887--14913.
[42]
Yun Mao, Lawrence K. Saul, and Jonathan M. Smith. 2006. IDES: An Internet Distance Estimation Service for Large Networks. IEEE J. Sel. Areas Commun. 24, 12 (2006), 2273--2284.
[43]
Matthew L. Massie, Brent N. Chun, and David E. Culler. 2004. The Ganglia distributed monitoring system: Design, Implementation, and Experience. Parallel Comput. 30, 5-6 (2004), 817--840.
[44]
Christopher Mutschler, Holger Ziekow, and Zbigniew Jerzak. 2013. The DEBS 2013 grand challenge. In The 7th ACM International Conference on Distributed Event-Based Systems, DEBS. 289--294.
[45]
T. S. Eugene Ng and Hui Zhang. 2002. Predicting Internet Network Distance with Coordinates-Based Approaches. In The 21st Annual Joint Conference of the IEEE Computer and Communications Societies, INFOCOM. 170--179.
[46]
T. S. Eugene Ng and Hui Zhang. 2004. A Network Positioning System for the Internet. In Proceedings of the General Track: USENIX Annual Technical Conference. 141--154.
[47]
Xia Pan, Xia Zhang, Hongyi Yu, and Chao Zhang. 2009. Study on routing protocol for WSNs based on the improved Prim algorithm. In 2009 International Conference on Wireless Communications & Signal Processing. 1--4.
[48]
Elena Beatriz Ouro Paz, Eleni Tzirita Zacharatou, and Volker Markl. 2021. Towards Resilient Data Management for the Internet of Moving Things. In Datenbanksysteme für Business, Technologie und Web, BTW. 279--301.
[49]
Peter R. Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, and Margo I. Seltzer. 2006. Network-Aware Operator Placement for Stream-Processing Systems. In Proceedings of the 22nd International Conference on Data Engineering, ICDE. 49.
[50]
Sean C. Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. 2004. Handling Churn in a DHT (Awarded Best Paper!). In Proceedings of the General Track: USENIX Annual Technical Conference. 127--140.
[51]
Stamatia Rizou, Frank Dürr, and Kurt Rothermel. 2010. Solving the MultiOperator Placement Problem in Large-Scale Operator Networks. In Proceedings of the 19th International Conference on Computer Communications and Networks, ICCCN. 1--6.
[52]
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53--65.
[53]
Atul Sandur, ChanHo Park, Stavros Volos, Gul Agha, and Myeongjae Jeon. 2022. Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing. In 38th IEEE International Conference on Data Engineering, ICDE. 1408--1422.
[54]
Zhitao Shen, Vikram Kumaran, Michael J. Franklin, Sailesh Krishnamurthy, Amit Bhat, Madhu Kumar, Robert Lerche, and Kim Macpherson. 2015. CSA: Streaming Engine for Internet of Things. IEEE Data Eng. Bull. 38, 4 (2015), 39--50.
[55]
Mohammad Shokouhifar and Ali Jalali. 2017. Optimized sugeno fuzzy clustering algorithm for wireless sensor networks. Eng. Appl. Artif. Intell. 60 (2017), 16--25.
[56]
RIPE NCC Staff. 2015. Ripe Atlas: A global internet measurement network. Internet Protocol Journal 18, 3 (2015), 2--26.
[57]
Moritz Steiner and Ernst W. Biersack. 2009. Where Is My Peer? Evaluation of the Vivaldi Network Coordinate System in Azureus. In 8th International Networking Conference, IFIP-TC. 145--156.
[58]
Nitin Sukhija and Elizabeth Bautista. 2019. Towards a Framework for Monitoring and Analyzing High Performance Computing Environments Using Kubernetes and Prometheus. In IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation. 257--262.
[59]
Hüseyin Özgür Tan and Ibrahim Korpeoglu. 2003. Power efficient data gathering and aggregation in wireless sensor networks. SIGMOD Rec. 32, 4 (2003), 66--71.
[60]
Cory Thoma, Alexandros Labrinidis, and Adam J. Lee. 2014. Automated operator placement in distributed Data Stream Management Systems subject to user constraints. In Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, ICDE. 310--316.
[61]
Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2019. Efficient Window Aggregation with General Stream Slicing. In 22nd International Conference on Extending Database Technology, EDBT. 97--108.
[62]
Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael J. Franklin, Benjamin Recht, and Ion Stoica. 2017. Drizzle: Fast and Adaptable Stream Processing at Scale. In Proceedings of the 26th Symposium on Operating Systems Principles. 374--389.
[63]
Massimo Villari, Maria Fazio, Schahram Dustdar, Omer F. Rana, and Rajiv Ranjan. 2016. Osmotic Computing: A New Paradigm for Edge/Cloud Integration. IEEE Cloud Comput. 3, 6 (2016), 76--83.
[64]
Guohui Wang, Bo Zhang, and T. S. Eugene Ng. 2007. Towards network triangle inequality violation aware distributed systems. In Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference, IMC. 175--188.
[65]
Jennifer Yick, Biswanath Mukherjee, and Dipak Ghosal. 2008. Wireless sensor network survey. Computer networks 52, 12 (2008), 2292--2330.
[66]
Ossama Younis and Sonia Fahmy. 2004. HEED: A Hybrid, Energy-Efficient, Distributed Clustering Approach for Ad Hoc Sensor Networks. IEEE Trans. Mob. Comput. 3, 4 (2004), 366--379.
[67]
Wang Yue, Lawrence Benson, and Tilmann Rabl. 2023. Desis: Efficient Window Aggregation in Decentralized Networks. In Proceedings 26th International Conference on Extending Database Technology, EDBT. 618--631.
[68]
Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Breß, Jonas Traub, and Volker Markl. 2020. The NebulaStream Platform for Data and Application Management in the Internet of Things. In 10th Conference on Innovative Data Systems Research, CIDR.
[69]
Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M. Grulich, Ariane Ziehn, and Volker Markl. 2020. NebulaStream: Complex Analytics Beyond the Cloud. Open J. Internet Things 6, 1 (2020), 66--81.

Cited By

View all
  • (2024)A Novel Intuitionistic Fuzzy Rough Sets-Based Clustering Model Based on Aczel–Alsina Aggregation OperatorsSymmetry10.3390/sym1610129216:10(1292)Online publication date: 1-Oct-2024
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 6
February 2024
369 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 03 May 2024
Published in PVLDB Volume 17, Issue 6

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)15
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Novel Intuitionistic Fuzzy Rough Sets-Based Clustering Model Based on Aczel–Alsina Aggregation OperatorsSymmetry10.3390/sym1610129216:10(1292)Online publication date: 1-Oct-2024
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media