[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

An adaptive non-migrating load-balanced distributed stream window join system

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Stream processing systems are widely used to process large amounts of data generated by applications in real time due to their advantages in latency and throughput. In most streaming applications, the system requires a comprehensive analysis of data from multiple data sources, so stream joins are the basis of stream processing systems. Similar to other big data problems, stream joins suffer from load imbalance, where a few nodes responsible for handling most of the load can become bottlenecks, thereby increasing latency and reducing throughput. Therefore, how to obtain a good load-balancing effect with low overhead is a critical issue in designing stream join systems. To solve this problem, we propose an adaptive non-migrating load-balancing method, which is mainly oriented to the stream window join problem. Considering that the completeness of the stream join results during the splitting of state to multiple downstream instances can be guaranteed by replicating the input tuples into multiple replicas and sending them to those downstream instances, our method can control the replication and forwarding of input tuples by setting up routing tables, and then when the system becomes unbalanced, our method can change the load distribution of the system by directly changing the partitioning of the tuples arriving later instead of state migration, and thus achieving load balancing with very low overhead. Based on our method, we develop a distributed stream window join system, NM-Join, which is built on Flink. We theoretically analyze the completeness and effectiveness of our method and provide extensive experimental evaluations of NM-Join in terms of load-balancing effect, latency, and throughput. Experimental results show that our method is able to perform load balancing with very low additional overhead, and thus outperforms existing load-balancing methods in terms of latency and throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability statement

All of the material is owned by the authors and/or no permissions are required.

Notes

  1. https://github.com/wufengliaoyu/NMJoin.

References

  1. Schranz C, Jeremias PM (2020) Deterministic time-series joins for asynchronous high-throughput data streams. In: 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, vol 1, pp 1031–1034. https://doi.org/10.1109/ETFA46521.2020.9211958

  2. Cheng Y, Hao Z, Cai R, Wen W (2018) Hpc2-ars: an architecture for real-time analytic of big data streams. In: 2018 IEEE International Conference on Web Services (ICWS), pp 319–322. https://doi.org/10.1109/ICWS.2018.00051

  3. Ananthanarayanan R, Basker V, Das S, Gupta A, Jiang H, Qiu T, Reznichenko A, Ryabkov D, Singh M, Venkataraman S (2013) Photon: fault-tolerant and scalable joining of continuous data streams. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 577–588. https://doi.org/10.1145/2463676.2465272

  4. Gong Y, Zhang Q, Han X, Huang X (2017) Phrase-based hashtag recommendation for microblog posts. Sci China Inf Sci 60(1):1–13. https://doi.org/10.1007/s11432-015-0900-x

    Article  Google Scholar 

  5. Shukla A, Chaturvedi S, Simmhan Y (2017) Riotbench: an iot benchmark for distributed stream processing systems. Concurr Comput Pract Exp 29(21):4257. https://doi.org/10.1002/cpe.4257

    Article  Google Scholar 

  6. Mrozek D, Tokarz K, Pankowski D, Małysiak-Mrozek B (2019) A hopping umbrella for fuzzy joining data streams from IoT devices in the cloud and on the edge. IEEE Trans Fuzzy Syst 28(5):916–928. https://doi.org/10.1109/TFUZZ.2019.2955056

    Article  Google Scholar 

  7. Zhang S, Liu C, Han Y, Li X (2018) Seamless integration of cloud and edge with a service-based approach. In: 2018 IEEE International Conference on Web Services (ICWS), pp 155–162. https://doi.org/10.1109/ICWS.2018.00027

  8. Najafi M, Sadoghi M, Jacobsen H-A (2016) \(\{\)SplitJoin\(\}\): a scalable, low-latency stream join architecture with adjustable ordering precision. In: 2016 USENIX Annual Technical Conference (USENIX ATC 16), pp 493–505. https://doi.org/10.5555/3026959.3027005

  9. Gulisano V, Nikolakopoulos Y, Papatriantafilou M, Tsigas P (2016) Scalejoin: a deterministic, disjoint-parallel and skew-resilient stream join. IEEE Trans Big Data 7(2):299–312. https://doi.org/10.1109/BigData.2015.7363751

    Article  Google Scholar 

  10. Lin Q, Ooi BC, Wang Z, Yu C (2015) Scalable distributed stream join processing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp 811–825. https://doi.org/10.1145/2723372.2746485

  11. Fang J-H, Zhao P-P, Liu A, Li Z-X, Zhao L (2019) Scalable and adaptive joins for trajectory data in distributed stream system. J Comput Sci Technol 34(4):747–761. https://doi.org/10.1007/s11390-019-1940-x

    Article  Google Scholar 

  12. Zhou S, Zhang F, Chen H, Jin H, Zhou BB (2019) Fastjoin: a skewness-aware distributed stream join system. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE. pp 1042–1052. https://doi.org/10.1109/IPDPS.2019.00111

  13. Kang J, Naughton JF, Viglas SD (2003) Evaluating window joins over unbounded streams. In: Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405). IEEE, pp 341–352. https://doi.org/10.1109/ICDE.2003.1260804

  14. Elseidy M, Elguindy A, Vitorovic A, Koch C (2014) Scalable and adaptive online joins. VLDB. https://doi.org/10.14778/2732279.2732281

  15. Shahvarani A, Jacobsen H-A (2020) Parallel index-based stream join on a multicore cpu. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp 2523–2537. https://doi.org/10.1145/3318464.3380576

  16. Wilschut AN, Flokstra J, Apers PM (1995) Parallel evaluation of multi-join queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp 115–126. https://doi.org/10.1145/223784.223803

  17. Viglas SD, Naughton JF, Burger J (2003) Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings 2003 VLDB Conference. Elsevier, pp 285–296. https://doi.org/10.1016/B978-012722442-8/50033-1

  18. Zhang F, Chen H, Jin H (2019) Simois: a scalable distributed stream join system with skewed workloads. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 176–185. https://doi.org/10.1109/ICDCS.2019.00026

  19. Gedik B, Bordawekar RR, Yu PS (2009) Celljoin: a parallel stream join operator for the cell processor. VLDB J 18(2):501–519. https://doi.org/10.1007/s00778-008-0116-z

    Article  Google Scholar 

  20. Buono D, De Matteis T, Mencagli G (2014) A high-throughput and low-latency parallelization of window-based stream joins on multicores. In: 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications. IEEE, pp 117–126. https://doi.org/10.1109/ISPA.2014.24

  21. Teubner J, Mueller R (2011) How soccer players would do stream joins. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 625–636. https://doi.org/10.1145/1989323.1989389

  22. Roy P, Teubner J, Gemulla R (2014) Low-latency handshake join. Proc VLDB Endowm 7(9):709–720. https://doi.org/10.14778/2732939.2732944

    Article  Google Scholar 

  23. Okcan A, Riedewald M (2011) Processing theta-joins using mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 949–960. https://doi.org/10.1145/1989323.1989423

  24. Fang J, Zhang R, Zhao Y, Zheng K, Zhou X, Zhou A (2019) A-dsp: an adaptive join algorithm for dynamic data stream on cloud system. IEEE Trans Knowl Data Eng 33(5):1861–1876. https://doi.org/10.1109/TKDE.2019.2947055

    Article  Google Scholar 

  25. Fang J, Wang X, Zhang R, Zhou A (2016) Flexible and adaptive stream join algorithm. In: Asia-Pacific Web Conference. Springer, pp 3–16. https://doi.org/10.1007/978-3-319-45817-5_1

  26. Fang J, Zhang R, Wang X, Zhou A (2017) Distributed stream join under workload variance. World Wide Web 20(5):1089–1110. https://doi.org/10.1007/s11280-017-0431-7

    Article  Google Scholar 

  27. Zhang F, Chen H, Jin H (2019) Simois: a scalable distributed stream join system with skewed workloads. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 176–185. https://doi.org/10.1109/ICDCS.2019.00026

  28. Yuan J, Wang Y, Chen H, Jin H, Liu H (2021) Eunomia: efficiently eliminating abnormal results in distributed stream join systems. In: 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, pp 1–11. https://doi.org/10.1109/IWQOS52092.2021.9521286

  29. Nikjoo F, Mirzaei A, Mohajer A (2018) A novel approach to efficient resource allocation in NOMA heterogeneous networks: multi-criteria green resource management. Appl Artif Intell 32(7–8):583–612. https://doi.org/10.1080/08839514.2018.1486132

    Article  Google Scholar 

  30. Mohajer A, Sorouri F, Mirzaei A, Ziaeddini A, Rad KJ, Bavaghar M (2022) Energy-aware hierarchical resource management and backhaul traffic optimization in heterogeneous cellular networks. IEEE Syst J. https://doi.org/10.1109/JSYST.2022.3154162

    Article  Google Scholar 

  31. Mohajer A, Daliri MS, Mirzaei A, Ziaeddini A, Nabipour M, Bavaghar M (2022) Heterogeneous computational resource allocation for NOMA: toward green mobile edge-computing systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2022.3186099

    Article  Google Scholar 

  32. Cardellini V, Lo Presti F, Nardelli M, Russo GR (2022) Runtime adaptation of data stream processing systems: the state of the art. ACM Comput Surv. https://doi.org/10.1145/3514496

    Article  Google Scholar 

  33. Lombardi F, Aniello L, Bonomi S, Querzoni L (2017) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585. https://doi.org/10.1109/TPDS.2017.2762683

    Article  Google Scholar 

  34. Cardellini V, Presti FL, Nardelli M, Russo GR (2018) Decentralized self-adaptation for elastic data stream processing. Fut Gen Comput Syst 87:171–185. https://doi.org/10.1016/j.future.2018.05.025

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant No. 62171155.

Funding

This study was funded by National Natural Science Foundation of China under Grant No. 62171155.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by QW, SC, and TL. The first draft of the manuscript was written by QW. The review and editing are mainly performed by DZ and ZZ. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhan Zhang.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Consent for publication

All authors of this paper have read and approved the final version submitted and contents of this manuscript have not been copyrighted or published previously and are not under consideration for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Q., Zuo, D., Zhang, Z. et al. An adaptive non-migrating load-balanced distributed stream window join system. J Supercomput 79, 8236–8264 (2023). https://doi.org/10.1007/s11227-022-04991-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04991-6

Keywords

Navigation