[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Hazelcast jet: low-latency stream processing at the 99.99th percentile

Published: 01 July 2021 Publication History

Abstract

Jet is an open source, high performance, distributed stream processor built at Hazelcast during the last five years. Jet was engineered with millisecond latency on the 99.99th percentile as its primary design goal. Originally Jet's purpose was to be an execution engine that performs complex business logic on top of streams generated by Hazelcast's In-memory Data Grid (IMDG): a set of in-memory, partitioned and replicated data structures. With time, Jet evolved into a full-fledged, scale-out stream processor that can handle out-of-order streams and provide exactly-once processing guarantees. Jet's end-to-end latency lies in the order of milliseconds, and its throughput in the order of millions of events per CPU-core. This paper presents the main design decisions we made in order to maximize the performance per CPU-core, alongside lessons learned, and an empirical performance evaluation.

References

[1]
Martin Abadi and Gordon Plotkin. 2009. A model of cooperative threads. In Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 29--40.
[2]
Adil Akhter, Marios Fragkoulis, and Asterios Katsifodimos. 2019. Stateful Functions as a Service in Action. Proc. VLDB Endow. 12, 12 (Aug. 2019), 1890--1893.
[3]
Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment 6, 11 (2013), 1033--1044.
[4]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, et al. 2015. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment 8, 12 (2015), 1792--1803.
[5]
Michael Armbrust, Tathagata Das, Joseph Torres, Burak Yavuz, Shixiong Zhu, Reynold Xin, Ali Ghodsi, Ion Stoica, and Matei Zaharia. 2018. Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). ACM, New York, NY, USA, 601--613.
[6]
E. Brewer. 2012. CAP twelve years later: How the "rules" have changed. Computer 45, 2 (2012), 23--29.
[7]
Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State Management in Apache Flink&Reg;: Consistent Stateful Distributed Stream Processing. Proc. VLDB Endow. 10, 12 (Aug. 2017), 1718--1729.
[8]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink TM: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38 (2015), 28--38.
[9]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C Platt, James F Terwilliger, and John Wernsing. 2014. Trill: a high-performance incremental query processor for diverse analytics. Proceedings of the VLDB Endowment 8, 4 (2014), 401--412.
[10]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, and James F. Terwilliger. 2015. Trill: Engineering a Library for Diverse Analytics. IEEE Data Eng. Bull. 38 (2015), 51--60.
[11]
K Mani Chandy and Leslie Lamport. 1985. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3, 1 (1985), 63--75.
[12]
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2013. Integrating Scale out and Fault Tolerance in Stream Processing Using Operator State Management. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD '13). ACM, New York, NY, USA, 725--736.
[13]
Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, and Asterios Katsifodimos. 2020. A Survey on the Evolution of Stream Processing Systems. arXiv:2008.00842 [cs.DC]
[14]
Goetz Graefe. 1990. Encapsulation of parallelism in the Volcano query processing system. ACM SIGMOD Record 19, 2 (1990), 102--111.
[15]
Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A catalog of stream processing optimizations. ACM Computing Surveys (CSUR) 46, 4 (2014), 1--34.
[16]
Gabriela Jacques-Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric Johnson, Michael Spicer, and Ahmet Erdem Sariyüce. 2016. Consistent Regions: Guaranteed Tuple Processing in IBM Streams. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1341--1352.
[17]
Gilles Kahn and David MacQueen. 1976. Coroutines and networks of parallel processes. (1976).
[18]
Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking distributed stream data processing systems. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1507--1518.
[19]
A. Katsifodimos and M. Fragkoulis. 2019. Operational Stream Processing: Towards Scalable and Consistent Event-Driven Applications. In EDBT.
[20]
Jay Kreps, Neha Narkhede, Jun Rao, et al. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, Vol. 11. 1--7.
[21]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). ACM, New York, NY, USA, 239--250.
[22]
Jin Li, Kristin Tufte, Vladislav Shkapenyuk, Vassilis Papadimos, Theodore Johnson, and David Maier. 2008. Out-of-order processing: a new architecture for high-performance stream systems. Proceedings of the VLDB Endowment 1, 1 (2008), 274--288.
[23]
Wei Lin, Haochuan Fan, Zhengping Qian, Junwei Xu, Sen Yang, Jingren Zhou, and Lidong Zhou. 2016. STREAMSCOPE: Continuous Reliable Distributed Processing of Big Data Streams. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (Santa Clara, CA) (NSDI'16). USENIX Association, Berkeley, CA, USA, 439--453. http://dl.acm.org/citation.cfm?id=2930611.2930640
[24]
Frank McSherry, Michael Isard, and Derek G Murray. 2015. Scalability! But at what {COST}?. In 15th Workshop on Hot Topics in Operating Systems (HotOS {XV}).
[25]
Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 439--455.
[26]
Sam Newman. 2015. Building Microservices (1sted.). O'Reilly Media, Inc.
[27]
Shadi A. Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H. Campbell. 2017. Samza: Stateful Scalable Stream Processing at LinkedIn. Proc. VLDB Endow. 10, 12 (Aug. 2017), 1634--1645.
[28]
Zhengping Qian, Yong He, Chunzhi Su, Zhuojie Wu, Hongyu Zhu, Taizhi Zhang, Lidong Zhou, Yuan Yu, and Zheng Zhang. 2013. TimeStream: Reliable Stream Computation in the Cloud. In Proceedings of the 8th ACM European Conference on Computer Systems (Prague, Czech Republic) (EuroSys '13). ACM, New York, NY, USA, 1--14.
[29]
Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, and Alexey Tumanov. 2020. Cloudburst: Stateful Functions-as-a-Service. 13, 12 (2020).
[30]
Ion Stoica, Robert Morris, David Karger, M Frans Kaashoek, and Hari Balakrishnan. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review 31, 4 (2001), 149--160.
[31]
Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2017. Low-latency sliding-window aggregation in worst-case constant time. In Proceedings of the 11th ACM international conference on distributed and event-based systems. 66--77.
[32]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@Twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). ACM, New York, NY, USA, 147--156.
[33]
Jonas Traub, Philipp M Grulich, Alejandro Rodríguez Cuéllar, Sebastian BreŠ, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2019. Efficient Window Aggregation with General Stream Slicing. In EDBT. 97--108.
[34]
P Tucker, K. Tufte, V. Papadimos, and D. Maier. 2002. NEXMark---A Benchmark for Queries over Data Streams. Technical Report. Technical report, OGI School of Science & Engineering at OHSU.
[35]
Ke Wang, Avrilia Floratou, Ashvin Agrawal, and Daniel Musgrave. 2020. Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines. In Proceedings of the SIGMOD 2020 International Conference on Management of Data. ACM, 2271--2285.

Cited By

View all
  • (2024)ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing FrameworksProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645036(2-13)Online publication date: 7-May-2024
  • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
  • (2024)Benchmarking scalability of stream processing frameworks deployed as microservices in the cloudJournal of Systems and Software10.1016/j.jss.2023.111879208:COnline publication date: 1-Feb-2024
  • Show More Cited By

Index Terms

  1. Hazelcast jet: low-latency stream processing at the 99.99th percentile
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 14, Issue 12
        July 2021
        587 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        Published: 01 July 2021
        Published in PVLDB Volume 14, Issue 12

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)13
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing FrameworksProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645036(2-13)Online publication date: 7-May-2024
        • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
        • (2024)Benchmarking scalability of stream processing frameworks deployed as microservices in the cloudJournal of Systems and Software10.1016/j.jss.2023.111879208:COnline publication date: 1-Feb-2024
        • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
        • (2023)Heap Size Adjustment with CPU ControlProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622988(114-128)Online publication date: 19-Oct-2023
        • (2023)GeaFlow: A Graph Extended and Accelerated Dataflow SystemProceedings of the ACM on Management of Data10.1145/35897711:2(1-27)Online publication date: 20-Jun-2023
        • (2022)Portals: An Extension of Dataflow Streaming for Stateful ServerlessProceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3563835.3567664(153-171)Online publication date: 29-Nov-2022
        • (2022)Analysing and Predicting Energy Consumption of Garbage Collectors in OpenJDKProceedings of the 19th International Conference on Managed Programming Languages and Runtimes10.1145/3546918.3546925(3-15)Online publication date: 14-Sep-2022
        • (2021)ClamorProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486996(654-669)Online publication date: 1-Nov-2021

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media