[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join Model

Published: 30 May 2024 Publication History

Abstract

Stream join is a fundamental operation in stream processing and has attracted extensive research due to its large resource consumption and serious impact on system performance. As the theoretical basis of stream join systems, the stream join model greatly affects system performance. State-of-the-art stream join models either consume too much computing resources or too much storage resources, thus resulting in lower throughput or higher latency. In this paper, we propose a new stream join model for processing arbitrary join predicates, called CoModel, which offers a flexible trade-off between memory and computing resource consumption. More importantly, CoModel can achieve the minimum sum of the number of store operations and join operations among all existing join models, and thus can achieve the lowest latency and highest throughput when the overheads associated with the local stream join for each input tuple are approximately constant. We give a trade-off strategy for CoModel and theoretically prove its performance advantages based on queuing theory. Furthermore, we design and implement an adaptive distributed stream join system, CoStream, based on CoModel. CoStream can adaptively adjust its structure according to resource constraints and statistics of input data. We conduct extensive experiments for CoStream to evaluate its performance and adaptivity, and the results show that CoStream has the lowest latency and highest throughput in various scenarios.

Supplemental Material

MP4 File
Presentation video and presentation slides
PPTX File
Presentation video and presentation slides

References

[1]
Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, and Shivakumar Venkataraman. 2013. Photon: fault-tolerant and scalable joining of continuous data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias (Eds.). ACM, 577--588. https://doi.org/10.1145/2463676.2465272
[2]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink#8482;: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull., Vol. 38, 4 (2015), 28--38. http://sites.computer.org/debull/A15dec/p28.pdf
[3]
Manuel Dossinger, Sebastian Michel, and Constantin Roudsarabi. 2019. CLASH: A High-Level Abstraction for Optimized, Multi-Way Stream Joins over Apache Storm. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1897--1900. https://doi.org/10.1145/3299869.3320217
[4]
Mohammed Elseidy, Abdallah Elguindy, Aleksandar Vitorovic, and Christoph Koch. 2014. Scalable and Adaptive Online Joins. Proc. VLDB Endow., Vol. 7, 6 (2014), 441--452. https://doi.org/10.14778/2732279.2732281
[5]
Junhua Fang, Rong Zhang, Yan Zhao, Kai Zheng, Xiaofang Zhou, and Aoying Zhou. 2021. A-DSP: An Adaptive Join Algorithm for Dynamic Data Stream on Cloud System. IEEE Trans. Knowl. Data Eng., Vol. 33, 5 (2021), 1861--1876. https://doi.org/10.1109/TKDE.2019.2947055
[6]
Junhua Fang, Pengpeng Zhao, An Liu, Zhixu Li, and Lei Zhao. 2019. Scalable and Adaptive Joins for Trajectory Data in Distributed Stream System. J. Comput. Sci. Technol., Vol. 34, 4 (2019), 747--761. https://doi.org/10.1007/s11390-019--1940-x
[7]
Bugra Gedik, Rajesh Bordawekar, and Philip S. Yu. 2009. CellJoin: a parallel stream join operator for the cell processor. VLDB J., Vol. 18, 2 (2009), 501--519. https://doi.org/10.1007/s00778-008-0116-z
[8]
Mor Harchol-Balter. 2013. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press.
[9]
Edwin H. Jacox and Hanan Samet. 2008. Metric space similarity joins. ACM Trans. Database Syst., Vol. 33, 2 (2008), 7:1--7:38. https://doi.org/10.1145/1366102.1366104
[10]
Gabriela Jacques-Silva, Ran Lei, Luwei Cheng, Guoqiang Jerry Chen, Kuen Ching, Tanji Hu, Yuan Mei, Kevin Wilfong, Rithin Shetty, Serhat Yilmaz, Anirban Banerjee, Benjamin Heintz, Shridar Iyer, and Anshul Jaiswal. 2018. Providing Streaming Joins as a Service at Facebook. Proc. VLDB Endow., Vol. 11, 12 (2018), 1809--1821. https://doi.org/10.14778/3229863.3229869
[11]
Jaewoo Kang, Jeffrey F. Naughton, and Stratis Viglas. 2003. Evaluating Window Joins over Unbounded Streams. In Proceedings of the 19th International Conference on Data Engineering, March 5--8, 2003, Bangalore, India, Umeshwar Dayal, Krithi Ramamritham, and T. M. Vijayaraman (Eds.). IEEE Computer Society, 341--352. https://doi.org/10.1109/ICDE.2003.1260804
[12]
Qian Lin, Beng Chin Ooi, Zhengkui Wang, and Cui Yu. 2015. Scalable Distributed Stream Join Processing. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives (Eds.). ACM, 811--825. https://doi.org/10.1145/2723372.2746485
[13]
Youzhong Ma, Shijie Jia, and Yongxin Zhang. 2017. A novel approach for high-dimensional vector similarity join query. Concurr. Comput. Pract. Exp., Vol. 29, 5 (2017). https://doi.org/10.1002/CPE.3952
[14]
Willi Mann, Nikolaus Augsten, and Panagiotis Bouros. 2016. An Empirical Evaluation of Set Similarity Join Techniques. Proc. VLDB Endow., Vol. 9, 9 (2016), 636--647. https://doi.org/10.14778/2947618.2947620
[15]
Gianmarco De Francisci Morales and Aristides Gionis. 2016. Streaming Similarity Self-Join. Proc. VLDB Endow., Vol. 9, 10 (2016), 792--803. https://doi.org/10.14778/2977797.2977805
[16]
Dariusz Mrozek, Krzysztof Tokarz, Daniel Pankowski, and Bozena Malysiak-Mrozek. 2020. A Hopping Umbrella for Fuzzy Joining Data Streams From IoT Devices in the Cloud and on the Edge. IEEE Trans. Fuzzy Syst., Vol. 28, 5 (2020), 916--928. https://doi.org/10.1109/TFUZZ.2019.2955056
[17]
M. Asif Naeem. 2019. Optimization and Extension of Stream-Relation Joins. Int. J. Inf. Technol. Decis. Mak., Vol. 18, 4 (2019), 1289--1315. https://doi.org/10.1142/S0219622019500214
[18]
Mohammadreza Najafi, Mohammad Sadoghi, and Hans-Arno Jacobsen. 2016. SplitJoin: A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22--24, 2016, Ajay Gulati and Hakim Weatherspoon (Eds.). USENIX Association, 493--505. https://www.usenix.org/conference/atc16/technical-sessions/presentation/najafi
[19]
Alper Okcan and Mirek Riedewald. 2011. Processing theta-joins using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12--16, 2011, Timos K. Sellis, René e J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 949--960. https://doi.org/10.1145/1989323.1989423
[20]
Wendy Osborn. 2022. Shedding strategies for optimizing join processing in spatial data streams. In 17th International Conference on Future Networks and Communications / 19th International Conference on Mobile Systems and Pervasive Computing / 12th International Conference on Sustainable Energy Information Technology (FNC/MobiSPC/SEIT 2022), August 9--11, 2022, Niagara Falls, Ontario, Canada (Procedia Computer Science, Vol. 203), Elhadi M. Shakshuki (Ed.). Elsevier, 149--156. https://doi.org/10.1016/J.PROCS.2022.07.021
[21]
Yuan Qiu, Serafeim Papadias, and Ke Yi. 2019. Streaming HyperCube: A Massively Parallel Stream Join Algorithm. In Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26--29, 2019, Melanie Herschel, Helena Galhardas, Berthold Reinwald, Irini Fundulaki, Carsten Binnig, and Zoi Kaoudi (Eds.). OpenProceedings.org, 642--645. https://doi.org/10.5441/002/edbt.2019.76
[22]
Pratanu Roy, Jens Teubner, and Rainer Gemulla. 2014. Low-Latency Handshake Join. Proc. VLDB Endow., Vol. 7, 9 (2014), 709--720. https://doi.org/10.14778/2732939.2732944
[23]
Christoph Schranz and Peter Michael Jeremias. 2020. Deterministic Time-Series Joins for Asynchronous High-Throughput Data Streams. In 25th IEEE International Conference on Emerging Technologies and Factory Automation, ETFA 2020, Vienna, Austria, September 8--11, 2020. IEEE, 1031--1034. https://doi.org/10.1109/ETFA46521.2020.9211958
[24]
Amirhesam Shahvarani and Hans-Arno Jacobsen. 2020. Parallel Index-based Stream Join on a Multicore CPU. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 2523--2537. https://doi.org/10.1145/3318464.3380576
[25]
Anshu Shukla, Shilpa Chaturvedi, and Yogesh Simmhan. 2017. RIoTBench: An IoT benchmark for distributed stream processing systems. Concurr. Comput. Pract. Exp., Vol. 29, 21 (2017). https://doi.org/10.1002/cpe.4257
[26]
George Siachamis, Kyriakos Psarakis, Marios Fragkoulis, Odysseas Papapetrou, Arie van Deursen, and Asterios Katsifodimos. 2023. Adaptive Distributed Streaming Similarity Joins. In Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems, DEBS 2023, Neuchatel, Switzerland, June 27--30, 2023, Valerio Schiavoni, Marcelo Pasin, Bettina Kemme, and Etienne Riviè re (Eds.). ACM, 25--36. https://doi.org/10.1145/3583678.3596891
[27]
Jens Teubner and René Mü ller. 2011. How soccer players would do stream joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12--16, 2011, Timos K. Sellis, René e J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 625--636. https://doi.org/10.1145/1989323.1989389
[28]
TPC. 1993. The TPC-H benchmark. http://www.tpc.org/tpch Retrieved October 1, 2023 from
[29]
Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. 2023. NEXMark - A Benchmark for Queries over Data Streams DRAFT. (09 2023).
[30]
Taegeon Um, Gyewon Lee, and Byung-Gon Chun. 2021. Pluto: High-Performance IoT-Aware Stream Processing. In 41st IEEE International Conference on Distributed Computing Systems, ICDCS 2021, Washington DC, USA, July 7--10, 2021. IEEE, 79--91. https://doi.org/10.1109/ICDCS51616.2021.00017
[31]
Juliane Verwiebe, Philipp M. Grulich, Jonas Traub, and Volker Markl. 2022. Algorithms for Windowed Aggregations and Joins on Distributed Stream Processing Systems. Datenbank-Spektrum, Vol. 22, 2 (2022), 99--107. https://doi.org/10.1007/s13222-022-00417-y
[32]
Stratis Viglas, Jeffrey F. Naughton, and Josef Burger. 2003. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, September 9--12, 2003, Johann Christoph Freytag, Peter C. Lockemann, Serge Abiteboul, Michael J. Carey, Patricia G. Selinger, and Andreas Heuer (Eds.). Morgan Kaufmann, 285--296. https://doi.org/10.1016/B978-012722442--8/50033--1
[33]
Aleksandar Vitorovic, Mohammed Elseidy, Khayyam Guliyev, Khue Vu Minh, Daniel Espino, Mohammad Dashti, Yannis Klonatos, and Christoph Koch. 2016. Squall: Scalable Real-time Analytics. Proc. VLDB Endow., Vol. 9, 13 (2016), 1553--1556. https://doi.org/10.14778/3007263.3007307
[34]
Annita N. Wilschut, Jan Flokstra, and Peter M. G. Apers. 1995. Parallel Evaluation of Multi-Join Queries. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA, May 22--25, 1995, Michael J. Carey and Donovan A. Schneider (Eds.). ACM Press, 115--126. https://doi.org/10.1145/223784.223803
[35]
Jianye Yang, Wenjie Zhang, Xiang Wang, Ying Zhang, and Xuemin Lin. 2020. Distributed Streaming Set Similarity Join. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 565--576. https://doi.org/10.1109/ICDE48307.2020.00055
[36]
Fan Zhang, Hanhua Chen, and Hai Jin. 2019a. Simois: A Scalable Distributed Stream Join System with Skewed Workloads. In 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019, Dallas, TX, USA, July 7--10, 2019. IEEE, 176--185. https://doi.org/10.1109/ICDCS.2019.00026
[37]
Shuhao Zhang, Yancan Mao, Jiong He, Philipp M. Grulich, Steffen Zeuch, Bingsheng He, Richard T. B. Ma, and Volker Markl. 2021. Parallelizing Intra-Window Join on Multicores: An Experimental Study. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2089--2101. https://doi.org/10.1145/3448016.3452793
[38]
Shuhao Zhang, Feng Zhang, Yingjun Wu, Bingsheng He, and Paul Johns. 2019b. Hardware-Conscious Stream Processing: A Survey. SIGMOD Rec., Vol. 48, 4 (2019), 18--29. https://doi.org/10.1145/3385658.3385662
[39]
Shunjie Zhou, Fan Zhang, Hanhua Chen, Hai Jin, and Bing Bing Zhou. 2019. FastJoin: A Skewness-Aware Distributed Stream Join System. In 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, Rio de Janeiro, Brazil, May 20--24, 2019. IEEE, 1042--1052. https://doi.org/10.1109/IPDPS.2019.00111

Cited By

View all
  • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 1-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 3
SIGMOD
June 2024
1953 pages
EISSN:2836-6573
DOI:10.1145/3670010
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024
Published in PACMMOD Volume 2, Issue 3

Permissions

Request permissions for this article.

Author Tags

  1. adaptivity
  2. distributed stream join
  3. join model
  4. queuing theory

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)31
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 1-May-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media