Abstract
The topic of Data Stream Processing is a recent and highly active research area dealing with the in-memory, tuple-by-tuple analysis of streaming data. Continuous queries typically consume huge volumes of data received at a great velocity. Solutions that persistently store all the input tuples and then perform off-line computation are impractical. Rather, queries must be executed continuously as data cross the streams. The goal of this paper is to present parallel patterns for window-based stateful operators, which are the most representative class of stateful data stream operators. Parallel patterns are presented “à la” Algorithmic Skeleton, by explaining the rationale of each pattern, the preconditions to safely apply it, and the outcome in terms of throughput, latency and memory consumption. The patterns have been implemented in the \(\mathtt {FastFlow}\) framework targeting off-the-shelf multicores. To the best of our knowledge this is the first time that a similar effort to merge the Data Stream Processing domain and the field of Structured Parallelism has been made.
Similar content being viewed by others
Notes
Replicated in a hypothetical message-passing abstract model. On multicores, based on the used run-time support, tuples replication can be avoided by sharing data, i.e. by passing memory pointers to the input tuples.
References
Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)
Apache spark streaming. https://spark.apache.org/streaming
Apache storm. https://storm.apache.org
Ibm infosphere streams. http://www-03.ibm.com/software/products/en/infosphere-streams
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’02, pp. 1–16. ACM, New York, NY, USA (2002)
Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)
González-Vèlez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)
Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. 46(4), 46:1–46:34 (2014)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Fastflow (ff). http://calvados.di.unipi.it/fastflow/ (2015)
Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Sliding window-based frequent pattern mining over data streams. Inf. Sci. 179(22), 3843–3865 (2009)
Aggarwal, C., Yu, P.: A survey of synopsis construction in data streams. In: Aggarwal, C. (ed.) Data Streams, Advances in Database Systems, vol. 31. Springer, New York (2007)
Patroumpas, K., Sellis, T.: Maintaining consistent results of continuous queries under diverse window specifications. Inf. Syst. 36(1), 42–61 (2011)
Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014)
Bertolli, C., Mencagli, G., Vanneschi, M.: Analyzing memory requirements for pervasive grid applications. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 297–301 (2010). doi:10.1109/PDP.2010.71
Aldinucci, M., Calcagno, C., Coppo, M., Damiani, F., Drocco, M., Sciacca, E., Spinella, S., Torquati, M., Troina, A.: On designing multicore-aware simulators for systems biology endowed with online statistics. BioMed Res. Int. 2014, 207041 (2014). doi:10.1155/2014/207041
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec. 34(1), 39–44 (2005)
Balkesen, C., Tatbul, N.: Scalable Data partitioning techniques for parallel sliding window processing over data streams. In: VLDB International Workshop on Data Management for Sensor Networks (DMSN’11). Seattle, WA, USA (2011)
Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: An efficient unbounded lock-free queue for multi-core systems. In: Proceedings of the 18th International Conference on Parallel Processing, Euro-Par’12, pp. 662–673. Springer-Verlag, Berlin, Heidelberg (2012)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD ’02. ACM, New York, NY, USA (2002)
Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng. 18(3), 377–391 (2006)
Mencagli, G., Vanneschi, M.: Towards a systematic approach to the dynamic adaptation of structured parallel computations using model predictive control. Clust. Comput. 17(4), 1443–1463 (2014)
Acknowledgments
This work has been partially supported by the EU H2020 project RePhrase (EC-RIA, H2020, ICT-2014-1).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
De Matteis, T., Mencagli, G. Parallel Patterns for Window-Based Stateful Operators on Data Streams: An Algorithmic Skeleton Approach. Int J Parallel Prog 45, 382–401 (2017). https://doi.org/10.1007/s10766-016-0413-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0413-x